Intelligent Machines
Efficient Hardware Repair
Researchers devise a new way to patch hardware like software, without slowing processors.
Defective chips can be expensive for computer manufacturers, especially when the hardware is recalled. They can also be a hassle for consumers, as they can cause computers to miscalculate, slow down, and, sometimes, crash. Computer-science professor Josep Torrellas thinks he has found a better way to deal with faulty chips: an efficient repair mechanism that treats hardware more like software, by fixing bugs with downloadable patches. His system is still in development, but he says it could ultimately make chip production faster and cheaper.
“We know how to fix software really easily,” says Torrellas, a professor at the University of Illinois at Urbana Champaign. “We send patches around. Wouldn’t it be nice if you could simply get another patch from the vendor to fix your hardware?”
The centerpiece of Torrellas’s system is Phoenix: special hardware that resides on the chip and can be programmed to detect defects and implement solutions. The prototype hardware consists of a standard semiconductor device called a field programmable gate array. While such devices are typically a bit slower than chips made for a single application, they have the advantage of being easily reprogrammed–an essential feature of Torrellas’s system.
In some ways, the system works much like antivirus software, which uses downloaded virus information to identify and eliminate new threats. Similarly, if a defect is discovered on a Phoenix-enabled chip, the manufacturer would automatically transmit the patch to all machines that might be affected. The patch contains a defect signature outlining the specific events that lead to the hardware problem. (For example, when the processor executes certain instructions and stores something in a particular part of the computer’s memory, the computer might crash.) Once installed, the patch reprograms the Phoenix device so that it monitors the chip for the defect signature and alters the computer’s processes to prevent a crash.
Torrellas says that most chips have dozens of defects, although not all are catastrophic: some simply result in miscalculations, for example. Today, manufacturers often deal with hardware problems by disabling features that are found to be defective. “In the end, the user loses functionality,” Torrellas says. When no solution can be found and the problem is critical, manufacturers recall the chips at their expense. A patching scheme would avoid those costs and maintain the chip’s functionality.
A Phoenix-enabled chip would also have a shorter time to market, according to Torrellas. Manufacturers could skip the last few weeks of testing, knowing that ultimately, they can solve problems with patches. “If they know that they could fix the problems later on, they could beat the competition to market,” he says.
Torrellas isn’t the first person to build patchable hardware; Crusoe and Itanium microprocessors, used in some laptop and desktop computers, can also be patched. But Torrellas says that Phoenix offers a more efficient approach. Crusoe microprocessors, which are made by Transmeta, have an additional level of complexity: special software that translates all commands. Defects are fixed by changing the way commands are interpreted. The process works, but Torrellas says it slows down the chip far more than Phoenix does. Itanium chips, which were developed jointly by Intel and Hewlett-Packard, are also relatively inefficient when patched, according to Torrellas. Moreover, a wider variety of problems can be fixed on a Phoenix-enabled chip.
Phoenix can’t fix all hardware defects, but Torrellas says it can recover from most critical bugs, such as those that would crash a computer. The Phoenix team performed a detailed analysis of past problems with AMD, Intel, IBM, and Motorola chips to determine which issues it should address first. Consequently, Phoenix is designed to focus on particularly problematic areas, such as the memory subsystem.
Whether Torrellas’s technology will make its way into commercial computers, however, is uncertain. “Their analysis of where bugs occur is excellent,” says Wilson Snyder, a principal engineer for the high-performance computer-hardware manufacturer SiCortex, based in Maynard, MA. “It provides a good, detailed look at signals that should be analyzed to discover bugs.” Hardware manufacturers could learn from the basic research behind Phoenix, Snyder says, and use it to eliminate hardware problems before chips hit the stores. But he questions whether manufacturers would ever implement Phoenix itself. Adding Phoenix onto an existing chip would take time and money, he points out.
Torrellas believes manufacturers will be amenable to a system like Phoenix, particularly as hardware problems grow. “Chip designs are becoming more and more complicated,” he says. “Bigger teams are designing the processors, so there is more scope for miscommunication.” The more problems pop up, the more manufacturers will be willing to consider new solutions.