Intelligent Machines

The Problem with Our Data Obsession

The quest to gather ever more information can make us value the wrong things and grow overconfident about what we know.

Feb 20, 2013

Intelligent Machines

The Problem with Our Data Obsession

The quest to gather ever more information can make us value the wrong things and grow overconfident about what we know.

Feb 20, 2013

A contentious question on the California ballot in 2008 inspired a simple online innovation: a website called Eightmaps.com. The number in the name referred to Proposition 8, which called for the state’s constitution to be amended to prohibit gay marriage. Under California’s campaign finance laws, all donations greater than $100 to groups advocating for or against Proposition 8 were recorded in a publicly accessible database. Someone (it’s still not clear who) took all the data about the proposition’s supporters—their names and zip codes, and their employers in some cases—and plotted it on a Google map.

After finding themselves on the map, several supporters of the gay-marriage ban said they were harassed or their businesses were boycotted. This unsettled even some opponents of Proposition 8; surely it wouldn’t be long, they said, before, say, religious fundamentalists created a similar tool to call out supporters of a gay-rights measure. The committee that had backed Proposition 8 asked a federal judge to strike down the disclosure law or raise its threshold beyond $100 so that more people could give anonymously. But he refused, arguing that ballot measures need the “sunshine” that donation disclosure provides. His ruling was aligned with the idea that as much data as possible about the political process should be revealed.

Evgeny Morozov worries that we are too often making this trade-off—opting to publish more information to increase transparency even if it undermines principles such as privacy or civic involvement. In his trenchant new book, To Save Everything Click Here, Morozov, a writer for Slate and The New Republic, uses the Eightmaps episode to support his claim that “Internet-centrism” is warping our view of what’s truly important.

Transparency is ascending at the expense of other values, Morozov suggests, mainly because it is so cheap and easy to use the Internet to distribute data that might someday prove useful. And because we’re so often told that the Internet has liberated us from the controls that “gatekeepers” had on information, rethinking the availability of information seems retrograde—and the tendency toward openness gathers even more force. (Notice that Facebook says its mission is “to make the world more open and transparent.”)

Morozov is not alone in fearing too much transparency. Harvard professor Lawrence Lessig has eloquently described why having more data about politicians is more likely to mislead people into cynicism than to make politics better. But ­Lessig seems resigned to the inevitability of such data-gathering projects in the age of the Internet. He believes the solution is to finance elections publicly, so people have less reason to be cynical about their lawmakers’ motivations.

That infuriates Morozov, because he believes Lessig is merely furthering a misconception that the Internet is like some force of nature rather than a human creation—that resistance is futile. On the contrary, Morozov says, resistance is required. His response to the problem raised by Eightmaps is not to simply accept that more information is going to be easily searchable and change the law accordingly. Instead, we should demand that our online systems respect values beyond mere transparency. Campaign donation databases, for example, could be designed so that records cannot be sucked out of them en masse. Yes, that would inhibit some easy data discoveries. But it could enhance democracy in the long run by making people feel freer to support causes that might be unpopular in their neighborhood or their office.

Morozov’s first book, The Net Delusion: The Dark Side of Internet Freedom, sought to puncture the myth that social media is a potent weapon against dictatorships. Quite the opposite, he said: savvy regimes use the Web to keep tabs on dissidents. This certainly appears to be true in China, Syria, and Iran. In his new book he is trying to deflate a more amorphous idea: “solutionism.” This is his word for the belief that with enough data about many complex aspects of life—including not just politics but also crime, traffic, and health—we can fix problems of inefficiency. For example, predictive software now analyzes crime statistics and helps police decide where to beef up patrols. Algorithms track website clicks and advise journalists on what kinds of stories to write. Morozov sees many ways this could go terribly wrong. For one thing, maximal efficiency is not necessarily a value to strive for; inefficiency often produces social benefits. Not knowing exactly how much readership each story got probably led newspapers to extensive coverage of state government.

Technological Hubris

But the most chilling potential problem is that the data we use to guide ourselves can be incomplete or overly reductionist. Many crimes go unreported, which could fool predictive policing software into thinking a neighborhood is safe. Cops on the beat, however, might be able to tell when things don’t seem quite right there and keep an eye out. ­Morozov fears a future in which such “intuitive knowledge” about how to deploy resources is overruled by algorithms that can work only with hard data and can’t, of course, account for the data they don’t have. Similarly, online records of someone’s campaign donations might seem detailed and hence instructive, but they will always offer at best a partial account of that person’s beliefs or role in the political process.

This concept might be Morozov’s strongest point: that however objective data may be, interpretation is subjective, and so is our choice about which data to record in the first place. While it might seem obvious that data, no matter how “big,” cannot perfectly represent life in all its complexity, information technology produces so much information that it is easy to forget just how much is missing.

This is not a new problem; the deceptive or even blinding qualities of big data plagued the first power users of computing. During the Vietnam War, the U.S. military wanted to stop North Vietnam from using the Ho Chi Minh Trail, a system of jungle passages through neighboring Laos, to send supplies to the communist insurgency in the south. Defense Secretary Robert S. McNamara, who had relied on quantitative management methods while running Ford Motor, did what came naturally: he sought more data about what was happening on the trail. Thus began Operation Igloo White. From 1967 through 1972, American planes flew over the trail and dropped 20,000 battery-powered sensors that looked like plants or wood but could detect voices and other sounds, body heat, urine, and the seismic disturbances particular to trucks. These sensors sent signals to American planes, which relayed the data to a U.S. command-and-control facility in Thailand, where technicians sitting at banks of terminals could see maps of the Ho Chi Minh Trail. When a sensor detected something, that section of the trail lit up like a white worm. IBM 360/65 computers at the center calculated how fast the worm was moving; that information was radioed to U.S. bombers so the area in question could be attacked.

From the control center, Igloo White might have looked pretty good. Worms appeared on the screens and then vanished in bombings. The data seemed to indicate that the Americans had destroyed thousands of trucks and disrupted routes delivering significant amounts of supplies. The military was sufficiently pleased to spend $1 billion a year on the program.

But congressional investigators would eventually cast doubt on the Pentagon’s assumptions about how many trucks had been bombed. The communists ultimately weren’t deterred from moving supplies southward. They even delivered tanks used in a huge offensive in the south in 1972. It turned out that the Americans didn’t realize the extent to which they and their IBM machines were acting on incomplete and unreliable data. For one thing, they couldn’t litter the entire trail with sensors. And the Vietnamese figured out how to game the system with bags of urine and tape-recorded truck sounds.

It might be tempting to dismiss this as another preposterous blunder in a war full of them. But that would be missing a crucial point. The lesson is not that Igloo White’s data-gathering technology was limited—though it was—but that the people using the data did not comprehend its limitations. In the 1996 book The Closed World, historian Paul N. Edwards describes Igloo White as an example of technological hubris. Military planners thought computers and real-time communications would let them create a “dome of global technological oversight,” yielding ever greater certainty about what was happening in the world. But many things don’t neatly fit under the dome; life is messy, and not everything can be abstracted into data for computers to act upon.

Data looks different these days, but our faith in the value of it—and the impulse to create an information panopticon—stubbornly remain. Google says it wants to “organize the world’s information and make it universally accessible and useful.” Morozov is right to question whether that is a worthy aim. Who knows what data analysis projects being carried out now will look as blinkered in 40 years as Igloo White does today?