Monday, February 14, 2011

In Parallel…

If I thought of Cook-Levin as ‘doable’, then the issue of programming for multi-cores is more ‘contemporary’ if not imminent. Dr Dobbs Journal recently brought out an article where it cautioned that the ‘free lunch is indeed over’. I am not surprised because my article in IEEE Potentials last year mentioned the very same thing. In fact I had ‘reference’d the article by Larus of Microsoft. To quote, my point was simply this
It needs little emphasis that in 1978, 29,000 transistors running at 5 MHZ in Intel 8086 shot up in 2006 to 291 million transistors running at 2.93 GHZ in Intel Core 2 Duo, but Moore's dividend can increase only when systems and software really exploit the same. More importantly, future systems and software must be designed with the topology and architecture of the hardware and how best to exploit it.
I had told at least my friends long back when this ‘Core Duo’ (Intel’s) came, it was simply a waste of time and effort because unless compilers are going to use, applications are NOT going to see any benefit. Of course as I mentioned in the article,
Operating systems in conjunction with the hardware can off-load many of the software tasks to the hardware.
This too may not be effective for individual programs which will continue with the same ‘runtime’. ‘Think parallel’ is a bit more complex and follows the ‘not-so-obvious’ paradigm. ‘Structured thinking’ and thereby ‘structured programming’ is strictly sequential. Though the head is still one can we use multiple hands? That is THE question.
One of the key components of many algorithms (sorting to name one) is the swap functionality. Ranging from primitive (additive functions) to the not-so-obvious XOR capability, swap is critical. In college days, I used to write the swap of two variables in C without using an ‘intermediary’ variable as below
a^=b;
b^=a;
a^=b;
Now this also involves execution of three instructions – if there is a ‘swap’ instruction, it may be just a macro for three similar instructions. So the run time is the run time of three instructions.
I think multiple cores can solve this problem easily and do the same in one cycle. To just take two cores, Core 1 reads ‘a’ at rising edge of clock and writes into ‘b’ at falling edge. Core 2 reads ‘b’ at rising edge of clock and writes into ‘a’ at falling edge of clock. We thus use two cores sharing the memory to do the swap in just one cycle. This is effective usage of multiple cores. There may be many other macros or assembly instructions which could be ‘reworked’ to effectively exploit the hardware and thus run faster.

I don’t think there is any catch and would suspect at least Intel to be already heads up on this.

No comments: