AlphaGo Zero: DeepMind’s newest AI system that learns from itself
DeepMind has developed an enhanced version of AlphaGo – the first computer program to defeat a world champion at the Chinese game of Go – which does not rely on human input. Previous versions of AlphaGo learned how to play Go by relying on human expertise (i.e. ‘training on thousands of human amateur and professional games’). Unlike them, AlphaGo Zero, described by the company as ‘arguably the strongest Go player in history’, learns to play by simply playing games against itself. DeepMind says that the new algorithm has ‘quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0’. And it only took it three days to reach this level. AlphaGo Zero relies on what is called ‘reinforcement learning’: it plays with itself, initially randomly, and it improves itself through a system of rewards (for winning) and punishments (for losing). The system starts off with a neural network that knows only the rules of Go, and it plays games against itself by combining the neural network and a powerful search algorithm. Through continuously playing the game, the neural network improves itself on an ongoing basis, becoming more powerful and efficient. Its most important feature is that ‘it is no longer constrained by the limits of human knowledge’, by being able to basically learn from itself. While noting that AlphaGo Zero is still in its early days, DeepMind considers it to be a ‘critical step’ towards solving some of the most important challenges faced by humanity, through applications in areas such as reducing energy consumption or searching for revolutionary new materials.