Making Sense of Mountains of Data

A new tool explores large sets of data–and might help organize the Web.

“How do you take a big collection of things and make sense out of it?” asks Gary Flake, founder and director of Microsoft Live Labs, a division of the software giant that designs experimental Web tools. The problem is becoming more common, even for the average user, because the Web makes huge quantities of information readily available.

Image crunching: Pivot is designed to help users tease out patterns as they sort through large amounts of data. The tool can be used to filter and organize images based on accompanying textual information.

Flake’s lab’s answer to this question is Pivot, a tool released to the public earlier this month in conjunction with a demonstration Flake gave at the TED conference in Long Beach, CA.

Pivot presents data in the form of a collection of images accompanied by textual data. Sorting through data collated from Wikipedia, for example, means creating thumbnail images to accompany that information. The user can zoom into this collage of images to see individual pieces of data more closely, or zoom out to see items grouped according to various criteria. Though other tools can be used to organize data in various configurations, Flake hopes that Pivot’s simple and intuitive graphical interface will help insights about the data to pop out visually.

The power of Pivot, Flake says, lies in its consistent user interface, which is designed to make it easy for users to tease out the patterns in a large set of data. “You can interact with the data in a way that’s not quite browsing and not quite searching,” he says.

The technology at the core of Pivot is Microsoft’s Seadragon, software designed for manipulating large quantities of visual information. It allows commodity hardware to rapidly move through vast collections of graphics, zooming in seamlessly without having to wait for information to load, and zooming out to view hundreds or thousands of images at once. A tool as visually rich as Pivot wouldn’t have been possible even five years ago, Flake says, because most users’ computers wouldn’t have been able to process the graphics.

The team at Live Labs has made several sample collections that users can view, but the intention is for users to make their own collections. To do this, they need to convert their images to the Deep Zoom format used by Seadragon, and annotate them using a format based on the Web standard Extensible Markup Language (XML). The team has released an add-in for Microsoft Excel, called the Pivot Collection Tool, that lets users do this without requiring knowledge of XML.

Collections can be simple, consisting of a relatively small number of images with static data attached, or they can be very large, connected to a feed of changing data. Since Pivot has been released, one user has used it to view and sort through his Facebook friends, examining how they are related to each other.

Collections: The Pivot Collection Tool for Excel lets users build their own data collections for Pivot without needing to understand XML.

Pivot can also be used to browse the Web and to view and organize Web pages. Flake says he foresees the Pivot interface being integrated with features such as search. Pivot might, he says, provide users a better way to sort through search results–instead of seeing the 10 top search results on a page, they could sort though thousands of results visually. “We are really taking a step back from the Web and trying to see it as a physical Web,” Flake says.

Pivot turned heads at TED. Roger McNamee, managing director and cofounder of the venture capital firm Elevation Partners, described the technology as “the brightest star” among a collection of impressive demos.

“Pivot’s clever user interface enables new forms of related and serendipitous search that I expect to transform the way I relate to the Web,” he says. “After several boring years, the ‘search wars’ may get interesting again.”

Martin Wattenberg, who developed the IBM data visualization program Many Eyes with his colleague Fernanda Viegas, says the Pivot team is tackling one of the big unsolved problems of dealing with modern data by building an interface that pulls together different types of media, including text and images.

Wattenberg says the goal is to provide more sophisticated mathematical techniques for sorting through data. “What is the standard deviation of a collection of images?” he says. “That question doesn’t even make sense.”

The key for all visualization tools, says Wattenberg, is to integrate them well with the rest of the Web. It would be useful to have open standards to let different software interoperate, he says, and to allow users to embed visualizations into their own Web pages. “We need to make visualization ubiquitous and seamless,” he says. “Like everything else in computing, we want to have it anytime, anywhere.”