How AlphaZero has rewritten the rules of game play on its own

David Silver says the computer program that taught itself to be a chess grandmaster exhibits “the essence of creativity.”

by Will Knight
February 22, 2019

David Silver invented something that might be more inventive than he is.

Silver was the lead researcher on AlphaGo, a computer program that learned to play Go—a famously tricky game that exploits human intuition rather than clear rules of play—by studying games played by humans.

Silver’s latest creation, AlphaZero, learns to play board games including Go, chess, and Shogi by practicing against itself. Through millions of practice games, AlphaZero discovers strategies that it took humans millennia to develop.

So could AI one day solve problems that human minds never could? I spoke to Silver at his London office at DeepMind, now owned by Alphabet.

In one famous game against possibly the best Go player ever, AlphaGo made a brilliant move that human observers initially thought was a mistake. Was it being creative in that moment?

“Move 37,” as it became known, surprised everyone, including the Go community and us, its makers. It was something outside of the expected way of playing Go that humans had figured out over thousands of years. To me this is an example of something being creative.

Since AlphaZero doesn’t learn from humans, is it even more creative?

When you have something learning by itself, that’s building up its own knowledge completely from scratch, it’s almost the essence of creativity.

AlphaZero has to figure out everything for itself. Every single step is a creative leap. Those insights are creative because they weren’t given to it by humans. And those leaps continue until it is something that is beyond our abilities and has the potential to amaze us.

You’ve had AlphaZero play against the top conventional chess engine, Stockfish. What have you learned?

Stockfish has this very sophisticated search engine, but at the heart of it is this module that says, “According to humans, this is a good position or a bad position.” So humans are really deeply in the loop there. It’s hard for it to break away and understand a position that’s fundamentally different.

AlphaZero learns to understand positions for itself. There was one beautiful game we were just looking at where it actually gives up four pawns in a row, and it even tries to give up a fifth pawn. Stockfish thinks it’s winning fantastically, but AlphaZero is really happy. It’s found a way to understand the position which is unthinkable according to the norms of chess. It understands it’s better to have the position than the four pawns.

Does AlphaZero suggest AI will play a role in future scientific innovation?

Machine learning has been dominated by an approach called supervised learning, which means you start off with everything that humans know, and you try to distill that into a computer program that does things in just the same way. The beauty of this new approach, reinforcement learning, is that the system learns for itself, from first principles, how to achieve the goals we set it. It’s like a million mini-discoveries, one after another, that build up this creative way of thinking. And if you can do that, you can end up with something that has immense power, immense ability to solve problems, and which can hopefully lead to big breakthroughs.

Are there aspects of human creativity that couldn’t be automated?

If we think about the capabilities of the human mind, we’re still a long way away from achieving that. We can achieve results in specialized domains like chess and Go with a massive amount of computer power dedicated to that one task. But the human mind is able to radically generalize to something different. You can change the rules of the game, and a human doesn’t need another 2,000 years to figure out how she should play.

I would say that maybe the frontier of AI at the moment—and where we’d like to go—is to increase the range and the flexibility of our algorithms to cover the full gamut of what the human mind can do. But that’s still a long way off.

How might we get there?

I’d like to preserve this idea that the system is free to create without being constrained by human knowledge.

A baby doesn’t worry about its career, or how many kids it’s going to have. It is playing with toys and learning manipulation skills. There’s an awful lot to learn about the world in the absence of a final goal. The same can and should be true of our systems.