In our global economy, the ability to understand languages other than one’s native tongue grows more important every day. That is why Jie Yang, a computer scientist at Carnegie Mellon University, and his colleagues have created software that reads Chinese signs and quickly translates them into English using nothing more than a palm-size computer equipped with a small camera.
“If you’re trying to get somewhere, it helps to be able to understand the signs around you,” says Yang, who plans to develop translation modules for Japanese, Arabic, and Korean if funding is available. “There’s a growing demand for products like this,” he explains.
The seemingly simple task of interpreting street signs actually represents a bundle of problematic processes-from machine vision to character recognition-with which researchers in computer science and artificial intelligence have been struggling for years. Yang chose some of the most promising software solutions and squeezed them into a handheld Pocket PC device. His system starts by scanning an image for hard edges-a sign’s borders. Next, it searches for cues such as contrasting, similar-size characters. Optical character recognition software “reads” the words, and other algorithms cluster them into plausible groupings.
Then comes the hard part: the actual translation. “Ambiguity is the biggest problem,” says Alex Waibel, director of Carnegie Mellon’s Interactive Systems Laboratory, where Yang is based. “Words can have so many different meanings in so many different languages in so many different situations.” That’s why a device that renders a literal translation can’t always do the job. If a Chinese sign meaning “Please do not touch” were translated word for word, for example, it would say, “Please do not move hand.”
To deal with such ambiguities, Yang considered three dominant machine-translation approaches, each with its own advantages. One, knowledge-based translation, requires humans to program rules that describe how a language’s words relate to one another. Slot Grammar, an IBM version of this approach, would break a sentence such as “John gave Mary a book” into the verb “gave” and three related words that fill slots: a subject (John), a direct object (book), and an indirect object (Mary). The IBM system then uses these relationships to search the knowledge base for similar relationships in the target language, and those relationships help generate a sensible translation.
At Microsoft Research in Redmond, WA, scientists rely on an example-based system that taps raw computing power rather than programmed rules. The Microsoft technology compares millions of words and phrases of translated text. Along the way, it generates an enormous bilingual dictionary of word relationships, phrases, and idioms. Then, in a process similar to sentence diagramming, it breaks target text into “logical forms” and finds the best-matching forms among those in the dictionary.
The third approach is statistical. Like Microsoft’s approach, it starts by crunching reams of word relationships. But unlike example-based systems, it scores the best translations using a statistical model that identifies the phrases and words in each language that properly match under a variety of circumstances.
Yang tried both the example-based and the statistical approach to see which translator would perform better. Right now, each delivers at least partially accurate translations 82 percent of the time, compared with 70 to 80 percent for most other translation systems, he says. Yang and other researchers agree that combining all three approaches should prove the most promising long-range strategy. “There’s no other way we can keep up,” says Steve Richardson, head of the machine translation project at Microsoft’s Natural Language Processing Group.
The harder researchers push toward 100 percent accuracy, unfortunately, the more stubborn the problem becomes. Machines will “never be as good as humans” at translation, says Yang. But, Waibel adds, they “are getting better all the time.”