Busà Photography/Getty

Intelligent Machines

Making face recognition less biased doesn’t make it less scary

Three new studies propose ways to make algorithms better at identifying people in different demographic groups. But without regulation, that won’t curb the technology’s potential for abuse.

In the past few years, there’s been a dramatic rise in the adoption of face recognition, detection, and analysis technology.

You’re probably most familiar with recognition systems, like Facebook’s photo-tagging recommender and Apple’s FaceID, which can identify specific individuals. Detection systems, on the other hand, determine whether a face is present at all; and analysis systems try to identify aspects like gender and race. All of these systems are now being used for a variety of purposes, from hiring and retail to security and surveillance.

Many people believe that such systems are both highly accurate and impartial. The logic goes that airport security staff can get tired and police can misjudge suspects, but a well-trained AI system should be able to consistently identify or categorize any image of a face.

But in practice, research has repeatedly shown that these systems deal with some demographic groups much more inaccurately than others. Last year, Gender Shades, a seminal study led by MIT Media Lab researcher Joy Buolamwini, found that gender classification systems sold by IBM, Microsoft, and Face++ had an error rate as much as 34.4 percentage points higher for darker-skinned females than lighter-skinned males. The ACLU of Northern California similarly found that Amazon’s platform was more likely to misidentify non-white than white members of Congress.

The problem is that face recognition and analysis systems are often trained on skewed data sets: they’re fed far fewer images of women and people with dark skin than they are images of men and people with light skin. And while many of them are supposedly tested for fairness, those tests don’t check performance on a wide enough range of faces—as Buolamwini found. These disparities perpetuate and further entrench existing injustices and lead to consequences that only worsen as the stakes get higher.

Three new papers released in the past week are now bringing much-needed attention to this issue. Here’s a brief description of each of them.

Paper #1. Last Thursday, Buolamwini released an update to Gender Shades by retesting the systems she’d previously examined and expanding her review to include Amazon’s Rekognition and a new system from a small AI company called Kairos. There is some good news. She found that IBM, Face++, and Microsoft all improved their gender classification accuracy for darker-skinned women, with Microsoft reducing its error rate to below 2%. On the other hand, Amazon’s and Kairos’s platforms still had accuracy gaps of 31 and 23 percentage points, respectively, between lighter males and darker females. Buolamwini said the study shows that these technologies must be externally audited to hold them technically accountable.

Paper #2. On Sunday, a study from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) demonstrated the effectiveness of a new algorithm for mitigating biases in a face detection system even when it’s trained on heavily biased data. As it trains, it also identifies which examples in the data are underrepresented and spends extra time looking at them to compensate. When the researchers tested the system against Buolamwini’s Gender Shades data set, they found that it helped close their own largest accuracy gap, between lighter- and darker-skinned males, compared with a standard training algorithm (though it didn’t eliminate it completely).

Paper #3. This morning, IBM Research released a paper that identifies dozens of features for measuring diversity beyond skin color and gender, including head height, face width, intra-eye distance, and age. The findings are based on previous research on human faces. “Unless we have measures of facial diversity,” says John Smith, one of the coauthors of the paper, “we can’t come back and enforce them as we train these face recognition systems.” In conjunction, the team released a new data set with 1 million images of faces, annotated with these new measures.

Different measures of facial diversity, presented in IBM Research's new paper.

Each of these studies has taken important steps toward addressing bias in facial recognition—by holding companies accountable, by creating new algorithms, and by expanding our understanding of data diversity. But creating fairer and more accurate systems is only half the battle.

Even the fairest and most accurate systems can still be used to infringe on people’s civil liberties. Last year, a Daily Beast investigation found that Amazon was actively pitching its facial surveillance platform to US Immigration and Customs Enforcement, better known as ICE, to aid its crackdown on migrant communities. An Intercept investigation also found that IBM developed the ability to identify the ethnicity of faces as part of a long-term partnership with the New York Police Department. This technology was then deployed in public surveillance cameras for testing, without the knowledge of city residents. Already, the UK Metropolitan Police use facial recognition to scan public crowds for people on watch lists, and China uses it for mass surveillance of all residents, for purposes including tracking dissidents.

In response to the rapid proliferation of these systems, a growing number of civil rights activists and technologists have called for them to be regulated; Google has even suspended its sale of such systems until it has clear strategies for preventing their abuse.

“Without algorithmic justice, algorithmic accuracy/technical fairness can create AI tools that are weaponized,” says Buolamwini.

This story originally appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, sign up here for free.