Facebook

Making deepfake tools doesn’t have to be irresponsible. Here’s how.

It’s possible to limit the harm synthetic media tools might cause—but it won’t happen without effort.

by Aviv Ovadya
December 12, 2019

Synthetic media technologies—popularly known as deepfakes—have real potential for positive impact. Voice synthesis, for example, will allow us to speak in hundreds of languages in our own voice. Video synthesis may help us simulate self-driving-car accidents to avoid mistakes in the future. And text synthesis can accelerate our ability to write both programs and prose.

But these advances can come at a gargantuan cost if we aren’t careful: the same underlying technologies can also enable deception with global ramifications.

Thankfully, we can both enable the technology’s promise and mitigate its peril. It will just take some hard work.

What I’m arguing here is meant to be a call to action to do that work—and a guide to support those of us creating such technology, whether we’re doing ground-breaking research, developing new products, or just making open-source systems for fun. This is also for those investing in or funding such work, the journalists who might help ensure that technology creators take their impacts seriously, and the friends or family of those creating this tech.

We can longer absent ourselves of responsibility by saying “There’s nothing we can do if people misuse our tools.” Because there are things we can do; it’s just that we often don’t bother. Another argument, “This technology will get made anyway,” is not entirely false—but the how and when matter significantly, and can change as a result of our choices. (For more on this, including some threat modeling, please check out our full paper, especially Section 2 about what leads to harm.) Finally, we can’t hide behind the trite acknowledgement that “there has always been deception” while ignoring the significant differences in degree and impact.

The costs of deepfake technology are not just theoretical. Synthetic face-swap videos harass journalists into silence; synthetic voices are being used for large fraudulent transactions, and synthetic faces have allegedly supported espionage. All of that is in spite of the current challenges of using hacked-together beta-quality software. The obstacles to using synthetic media are still too high for the technology to be compelling to most malicious actors—but as it moves from buggy betas and into the hands of billions of people, we have a responsibility to avoid worst-case scenarios by making it as hard as possible to use deepfakes for evil. How?

Approach 1: Limiting who can use a tool ... and how

There are several things we can do that make malicious use much less likely. One obvious, simple, and effective approach is to carefully vet those who can use a tool. This is what companies like Synthesia are doing—essentially, working only with vetted enterprise clients.

An alternative is to constrain usage: to limit what users can synthesize or manipulate. For example, it is possible to build tools to ensure that only particular pre-chosen voices or faces can be manipulated. This is what Humen, for example, is doing—providing a limited set of movements that the person in a generated video can make.

These may not be an option for many systems, however. So what else can one do?

Approach 2: Discouraging malicious use

For synthetic media tools that are general and may be made widely available, there are still many possible ways to reduce malicious use. Here are some examples.

Clear disclosure: Request that synthesized media be clearly indicated as such—particularly material that might be used to mislead. Tools may be able to support this by including clear visual or audible notices in output files, such as visible warnings or spoken disclosures. At minimum, metadata should indicate how media was synthesized or manipulated.
Consent protection: Require the consent of those being impersonated. The voice cloning tool Lyrebird requires users to speak particular phrases in order to model their voice. This makes it more difficult to impersonate someone without consent, which would be very possible if it simply generated voices using any provided data set. This, of course, is applicable only for tools that enable impersonation.
Detection friendliness: Ensure that the synthesized media is not inordinately difficult to detect; keep detector tools up to date; collaborate with those working on detection to keep them in the loop on new developments.
Hidden watermarks: Embed context about synthesis—or even the original media—through robust watermarks, both using methods that are accessible to anyone with the proper tools, and through approaches that are secret and difficult to remove. (For example, Modulate.ai watermarks audio that it generates, while products like Imatag and their open-source equivalents enable watermarking for imagery.)
Usage logs: Store information about usage and media outputs in a way that researchers and journalists can access to identify if, for example, a video was probably synthesized using a particular tool. This could include storing timestamps of synthesis with a robust hash or media embedding.
Use restrictions: Provide and enforce contractual terms of use that prohibit and penalize unwanted actions, such as attempting to remove disclosures or watermarks, or violating the consent of other people. A stronger version of this approach involves explicitly elaborating allowed uses of the technology or resulting outputs.

Not all these strategies are applicable to every system. Some may have their risks, and none are perfect—or sufficient on their own. They are all part of a “defense in depth,” where more is more. Even if we have vetting or constraints, these approaches still make a system more robust against adversaries. And while following these rules might work best for “software as a service” systems, which are delivered without revealing any source code, they could still provide some value for open-source tools and models: many bad actors won’t have the technical capability to get around these protective measures. (The full paper explores when to publish source code in the first place).

Supporting ethical deepfake tools

Doing extra work to protect people from harm can be hard to justify in today’s competitive business environment—until an irreversible catastrophe happens. So how do we help ensure that these approaches are implemented before it’s too late? Here are four things funders, governments, journalists, and the public can do now to support those making ethical synthetic media tools.

Make doing the right thing easy

That means that we must invest in research into all these areas so that we have widely available, well-funded, open-source tools to implement these approaches. The history of information security shows that when easy-to-use open-source tools can be used to keep things secure, a lot more things end up secure. The same logic applies here: at the very least, we urgently need to make it easy to provide standardized disclosure metadata, watermarks, and logging. We also need research to explore whether it is feasible to bake detectability into trained models before distribution. Without this kind of infrastructure and research, we will see many well-intentioned new tools being used in horrific ways.

Foster expertise in mitigating misuse

Just as in privacy and security, we must support communities focused on hardening systems and addressing bad actors, and companies must pay people to do this work, either as consultants or in house.

Avoid funding, building, and distributing irresponsible tools

If a company or tool doesn’t at least attempt to reduce the chance of misuse, funders and developers should not support it. Tool makers that do not implement the best practices listed above must have very good reasons, and app stores should require them by default.

Create norms by holding people accountable

If they are negligent, we should be calling them out—even if they are our investees, coworkers, friends, or family. We can create ethical norms by commending those who do the right thing, and pushing those who don’t to do better.

Organizations advancing deepfake technology, like Nvidia, Adobe, Facebook, and Google, should be investing heavily in all of the above. Venture capitalists and foundations should also do their part in supporting this work and being careful who they support.

This is only one slice of a much broader set of efforts that are needed, and in many cases they may buy us only a little more time. That means it’s imperative to ensure that platforms and policymakers use that time wisely to make our information ecosystem more resilient.

Our past is littered with those who wished they had introduced their inventions more carefully—and we only have one future. Let’s not screw it up.

—Aviv Ovadya is the founder of the Thoughtful Technology Project and a non-resident fellow at the German Marshall Fund’s Alliance for Securing Democracy. See the full paper on mitigating negative impacts of synthetic media research, coauthored with Jess Whittlestone, here.