Watching the advancement in AI is always enjoyable!Some certified nerd
Just two years ago, we witnessed AlphaZero released by the DeepMind team. Within only 24 hours, it achieved a superhuman level of play in these three games by defeating world-champion programs, Stockfish, elmo, and the 3-day version of AlphaGo Zero. In each case, it made use of custom tensor processing units (TPUs) that the Google programs were optimized to use.
How AlphaZero was trained is interesting. It learned to play chess solely via self-play, using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks.
Meanwhile, the in-training AlphaZero was periodically matched against its benchmark (Stockfish, elmo, or AlphaGo Zero) in brief one-second-per-move games to determine how well the training was progressing. DeepMind judged that AlphaZero’s performance exceeded the benchmark after around four hours of training for Stockfish, two hours for elmo, and eight hours for AlphaGo Zero.
Multi-Agent Hide and Seek
A few days ago, OpenAI uploaded a video on YouTube demonstrated the results of AI Agents to Play Hide-And-Seek. Many audiences described it as shocking results, nevertheless, it is not always the case if you already know AlphaZero.
The experiment used a multi-agent system of Emergent Tool. In the system, agents are pieces of simulated software exposed the following characteristics:
- Autonomy: agents at least partially independent, self-aware, autonomous
- Local views: no agent has a full global view. Sometimes, the system is too complex for an agent to exploit such knowledge
- Decentralization: no agent is designated as controlling (or the system is effectively reduced to a monolithic system)
Agents were set some simple rules for the game and placed in a virtual environment of walls and blocks. The team observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek. Through self-learning in the new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which OpenAI team did not know their environment supported.
So, let’s enjoy the amazing video! :D.