Google’s PlaNet AI Learns Planning from Pixels
welcome everyone to ProHeadlines! Your one-stop solution for learning the most trending technologies in the world. Today we are going to talk about Google’s PlaNet AI Learns Planning from Pixels.
PlaNET, a technique that is meant to solve challenging image-based planning tasks with sparse rewards. Ok, that sounds great, but what do all of these terms mean? The planning part is simple, it means that the AI has to come up with a sequence of actions to achieve a goal, like a pole balancing with a cart, teaching a virtual human or a cheetah to walk, or hitting this box the right way to make sure it keeps rotating. The image-based part is big – this means that the AI has to learn the same way as a human, and that is, by looking at the pixels of the images. This is a huge difficulty bump because the AI does not only have to learn to defeat the game itself, but also has to build an understanding of the visual concepts within the game.
DeepMind’s legendary Deep Q-Learning algorithm:
DeepMind’s legendary Deep Q-Learning algorithm was also able to learn from pixel inputs, but it was mighty inefficient at doing that, and no wonder, this problem formulation is immensely hard and it is a miracle that we can muster any solution at all that can figure it out. The sparse reward part means that we rarely get feedback as to how well we are doing these tasks, which is a nightmare situation for any learning algorithm.
A key difference with this technique against classical reinforcement learning, which is what most researchers reach out to solve similar tasks, is that this one uses a model for the planning. This means that it does not learn every new task from scratch, but after the first game, whichever it may be, it will have a rudimentary understanding of gravity and dynamics, and it will be able to reuse this knowledge in the next games.
PlaNet AI As a result:
As a result, it will get a headstart when learning a new game and is therefore often 50 times more efficient than the previous technique that learns from scratch, and not only that, but it has other really cool advantages as well which I will tell you about in just a moment. Here you can see that indeed, the blue lines significantly outperform the previous techniques shown with red and green for each of these tasks. I like how this plot is organized in the same grid as the tasks were as it makes it much more readable when juxtaposed with the video footage.
Here are the two really cool additional advantages of this model-based agent. The first is that we don’t have to train six separate AIs for all of these tasks, but finally, we can get one AI that is able to solve all six of these tasks efficiently. And second, it can look at as little as five frames of an animation, which is approximately one fifth of a second worth of footage…that is barely anything and it is able to predict how the sequence would continue with a remarkably high accuracy, and, over a long time frame,which is quite a challenge. This is an excellent paper with beautiful mathematical formulations, I recommend that you have a look in the video description.
The source code is also available free of charge for everyone, so I bet this will be an exciting direction for future research works, and I’ll be here to report on it to you.
Thank you and have a great day!