Geek Activity Page: Web Libs

Build a content filter that rewrites the Web – your way, Mad Lib style!

If you are one of the many who feel that the media are unforgivably biased, the Web now has a solution for you. Greasemonkey, an add-on for the open-source Firefox browser, can act as a programmable content filter, sanitizing or scandalizing the news before you see it. For fun, we wrote a simple script (detailed below) that lets Greasemonkey rewrite the news ungrammatically, or render it politically incorrect or even offensive. No matter where you stand on the political spectrum, you’ll see that Greasemonkey and related technologies are destroying one of the last one-way streets in the media world. While the Internet may be interactive, many of the most trusted and reputable websites still treat readers as passive recipients of content. Pages are rendered on the computer screen more or less the way the publishers intended, and your job is to consume, not to participate.

But of course, Web pages are nothing more than large collections of bits, and bits are easy to flip, cut, and splice. Nothing can stop the data that the New York Times or MSNBC sends to your computer from being modified before it is displayed.

It used to be hard to write programs that hacked Web pages in real time. Mozilla Firefox changed that with a plug-in architecture and a series of extensions. One of the best-known Firefox extensions is Adblock, which lets you suppress any website advertisement you choose.

More interesting for the programmer is Greasemonkey, a nifty extension by Aaron Boodman and Jeremy Dunck that lets you write JavaScript programs that can rip apart Web pages on the fly. Greasemonkey hooks JavaScript into the innards of the browser, making it much easier to hack a Web page. This frees you to concentrate on what’s fun – for example, writing a program that inverts a website’s stated intent.

That’s what we did with Doubletake, a wacky script that subverts a page’s original HTML with a list of specified substitutions. It’s like Mad Libs for the Web: Web Libs.

If you download Firefox, install Greasemonkey, and activate Doubletake, every Web page you view will be carefully rewritten using words of your own choosing. If a particular politician seems a bit mentally challenged, you can replace his name with “Village Idiot.” Or whatever.

Doubletake is engineered to take advantage of built-in JavaScript functions such as the replace method, which can act upon the document object containing the HTML for a Web page. Repeatedly calling the replace function for each word will rewrite the document. This approach is sluggish. The time required is proportional to the size of the document multiplied by the length of the list of words to be replaced.

To create a snappier version, we used JavaScript’s built-in hash tables to store the list of words to be replaced. We preprocessed this list and built a table called matchTable, then broke the document apart and replaced every word appearing in the table.

if (typeof matchTable[word]!=”undefined”){
ans=ans+matchTable[word];
} else {
ans=ans+word;
}

However long the list of words to be replaced, the matchTable function finds each match in a constant amount of time, so the time required is proportional only to the size of the document.

The technologies at work here have more-practical applications as well. For example, Greasemonkey scripts can modify the style sheets that control how Web pages are displayed, so your browser could, say, display all text as black type on a white background in 14-point font size – just the thing for the 20 million Americans who have significant vision problems.

Firefox and Greasemonkey show the inherently democratizing power of open-source software. Giving everyone the ability to rewrite source code is upsetting the balance of power between programmers and users, and between publishers and readers. Of course, website authors who don’t want their artistic integrity eroded can fight back: one of the most common techniques for sabotaging end-user control is to put text inside graphics or multimedia Flash presentations. But these tricks make websites inaccessible for the blind (who rely on text readers) and impossible to navigate using cell phones. The battle for the future of mass communication is just beginning.

Code and instructions at doubletake.ex.com.

Simson Garfinkel is a programmer and researcher in the field of computer security and the author of Database Nation: The Death of Privacy in the 21st Century. Peter Wayner is a programmer and the author of Translucent Databases.