TechWeb

How Multicore Processors Are Reshaping Computing

Mar 27, 2007 (05:03 PM EDT)

Read the Original Article at http://www.informationweek.com/news/showArticle.jhtml?articleID=198700652


SANTA CLARA, Calif. — Multicore systems-on-chip will force designers to "rethink computer architectures in a most fundamental way," said Anant Agarwal, professor at the Massachusetts Institute of Technology (MIT) and a keynote speaker at the Multicore Expo here Tuesday (March 27). Agarwal discussed how multicore technology will impact the way designers size resources and connect cores, and proposed a new multicore programming approach.

"Multicore has really crashed the computing party," Agarwal said. While today's multicore systems may have 2 to 16 cores, he said, cores will range into the hundreds within the next two to three years and reach 1,000 early in the next decade. There are three critical questions, he said — how we size resources, how we connect cores, and how programming must evolve if multicore is to succeed.

In the past, Agarwal noted, a designer given more area would simply increase the sizes of caches or pipelines. But now there's another option, which is adding another processor. In many cases, he said, adding another processor and keeping cache sizes small will produce more performance benefits than keeping the same number of processors and enlarging caches.

Agarwal introduced the "KILL" rule — "kill if less than linear." This means that a resource in a core must be increased in area only if the core's performance increases proportionally. Using this rule, Agarwal said, it's possible to find the optimum size for a cache in a given multicore system. The general rule, he said, will be "much smaller caches than we have today."

As clock speeds become slower, caches can shrink even more, Agarwal said. If the clock rate goes from 4 GHz to 1 GHz, he said, the miss rate per cache can go up by a factor of 4, and thus the cache can be 16 times smaller.

As for the second question, how should cores be connected, Agarwal argued for distributed meshes rather than busses or rings. He said that meshes are scalable because bisection bandwidth increases as more cores are added. Further, he said, meshes can be 80-90 percent more power efficient than busses for 16 cores, and meshes also offer simple layouts.

The wave of the future, Agarwal said, is "tiled" multicore architectures that are fully distributed with no centralized resources. Agarwal directs a project called Raw that's working on such an architecture. "The bus-based multicore system will fade in the next year or two," he predicted.

In his discussion of how programming must evolve, Agarwal raised a question — why is multicore programming so hard? First of all, he said, it's new. And it's a matter of perception. "Sequential programs are harder than parallel processing for many classes of programs," he said.

Agarwal said that current parallel programming tools for multicore are about where VLSI design tools were in the 1980's, "still in the dark ages." He said that tools, standards, and ecosystems are needed. "We have the opportunity to create API foundations for the multicore age," he said. "Who will be the Microsoft, the Cadence, or Synopsys of multicore?"

Agarwal said that old programming approaches fall short. Pthreads (POSIX threads), he said, are okay in the short term, but they offer no encapsulation or modularity. Direct memory access (DMA) with shared memory wastes pin bandwidth and energy. And Message Passing Interface (MPI) has high overhead and large memory footprints.

A more promising concept, Agarwal said, is already used for ASIC design — streaming data from one compute unit to another. It's fast and energy efficient, he said, is well developed in hardware design, and is familiar to the software community because it's similar to the sockets used for networking applications.

Agarwal said that a core-to-core data transfer can be cheaper than memory access. Latency for a cache-to-cache transfer can be as low as 50 cycles, and for a register-to-register transfer as low as 5 cycles, he said. Agarwal said that a "socket like" stream-based programming API could be of great benefit for multicore devices, and he noted that the Multicore Association's proposed Communications API (CAPI) standard is such an API.

Even with reduced cache sizes, mesh-based interconnect, and stream-based programming, change won't happen too quickly, Agarwal said. "Successful solutions will offer an evolutionary path," he said. "Therein lies our challenge."

The Multicore Expo runs March 27-29 in Santa Clara, Calif.