Intelligent Machines

The Promise of Personal Supercomputers

What will it take to put thousands of microprocessors in cell phones and laptops?

Last week, Intel announced a research project that made geeks jump with glee: the first programmable “terascale” supercomputer on a chip. The company demonstrated a single chip with 80 cores, or processors, and showed that these cores could be programmed to crunch numbers at the rate of a trillion operations per second, a measure known as a teraflop. The chip is about the size of a large postage stamp, but it has the same calculation speed as a supercomputer that, in 1996, took up about 2,000 square feet and drew about 1,000 times more power.

Not so super: A supercomputer in 1996 (top left) capable of calculating one trillion operations per second took up about 2,000 square feet and consumed 500,000 watts. Recently, Intel unveiled an 80-core research chip (bottom right) that achieves the same calculation rate but is the size of a large postage stamp and uses about 65 watts.

This research chip is one of Intel’s first steps toward massively multicore technology, says Nitin Borkar, engineering manager and lab project head at Intel. The goal, he says, is to use this chip to test techniques that could make massively multicore technology faster, more energy efficient, and, most daunting, easy to program. These techniques will be “funneled into future products” that could appear, if all goes well, within five to ten years.

But nearly all engineers in the computing industry agree that making consumer computers with hundreds of cores won’t be easy. In fact, many aren’t even sure that it can be done. The most glaring challenge will be to find a way to completely overhaul software so that applications can take advantage of numerous cores. This includes teaching software developers how to write code for multicore machines–a task known as parallel programming–and developing new tools that allow them to code accurately and efficiently.

Researchers and visionaries are already thinking about how these supercomputer chips can best be used. Intel thinks that recognition, mining, and synthesis (RMS) applications will be key. Put together, these technologies could allow real-time language translation via cell phones, real-time video search by spoken phrase or image, and better recommendation systems for shopping, meal planning, and even health care.

To make these applications a reality, the computing industry will experience some growing pains, says David Patterson, professor of computer science at the University of California, Berkeley. (He and his colleagues have a website that hosts discussions and provides a white paper and videos on the topic.) “We’re at the early stages of this gigantic change,” Patterson says. He describes the direction in which the industry has decided to go–abandoning performance-constrained, single-core processors for multicore technology–as like a “Hail Mary pass” thrown in a football game. Chip makers are putting more and more cores on a chip, but the software engineers aren’t sure if they can keep up. “It’s an exciting time for researchers,” Patterson says, “if we can figure out how to catch the pass.”

Because clock frequency–the measure of processor speed–of single-core chips kept rising steadily for decades, programmers could dodge the challenge of programming in parallel, says John Shalf, computer scientist at Lawrence Berkeley Laboratory, in Berkeley, CA. Their programs would run faster if they just waited 18 months for the next generation of chip to arrive, he says. But by about 2002, it became evident that these single-core chips were consuming too much power and weren’t going to be able to maintain the speed increases. So, the industry decided to change tack: instead of trying to eke out more speed from a single processor, chip makers simply added another processor. “Now that we can’t crank up the clock frequency, we have to face parallelism head-on,” Shalf says, “and the best way to characterize the industry’s response is widespread panic.”

What’s so scary about parallel programming? To start, it’s been relegated to specialists in the high-performance computing (HPC) world who write software that solves specific problems that run on machines with thousands and millions of cores. The software can create climate-change models or predict the folding of proteins, but it is written for a single laborious task, not for the whims of consumers who want to talk to a computer, watch high-definition video, and keep tabs on their ageing parents halfway across the country.

Specifically, parallel programs must be written so that tasks can be appropriately shared among processors. This is difficult because not all applications naturally have components that can be separated; sometimes when they are separated, tasks are completed at different times and create bottlenecks. In addition, there are complications with shared resources: if an application needs to access data in memory that’s shared by tens or hundreds of other cores, the program could slow down or freeze. Moreover, debugging parallel programs can be a nightmare because often, a mistake is hard to duplicate, making the source of the problem difficult to find.

But even with all the challenges, there is hope, says Shalf. HPC researchers have developed portfolios of parallel algorithms that could be useful for consumer parallel programs. In addition, there are already massively multicore products on the market, and they are providing clues as to the best approaches from an architecture standpoint as well as from a programming standpoint. For instance, graphics company NVIDIA just released a commercial chip with 128 cores, designed to render graphics for applications such as video games. Many of the cores are general-purpose, meaning they can be programmed to do many different graphics-oriented tasks. The alternative is for the cores to have instructions hard-wired into the chip.

In addition, Intel, AMD, and others are collaborating with academic researchers to try to create a parallel-programming framework that can be agreed upon. One approach that looks promising is called transactional memory, says Krste Asanović, professor of computer science at MIT. (See “The Trouble with Multicore Computers.”) Using transactional memory, a combination of chip architecture and code, programmers will be allowed to think more sequentially, as they do when they program single-core systems, and let the system provide the parallelism. Asanović says that programmers write instructions that start and end in a linear fashion, “but behind the scenes they run in parallel.” This approach requires cooperation from hardware vendors as well as software engineers because the hardware and software must work together. “The two communities are talking,” Asanović says, “but there’s no consensus on what it’s going to look like.” He adds that transactional memory will most likely be one of a combination of approaches that could help make programming in parallel easier.

Without a consensus on how to proceed with multicore technology, however, the consumer computing industry might find itself at a standstill in about five years, says Shalf. But he’s optimistic because the field of parallel computing has been injected with a new sense of urgency with the emergence of dual- and quad-core products. “In academia, we can disagree for years,” he says, “but industry has a way, with its economic imperative, to settle on a solution pretty quick.”