You can run PPO or DQN right now on the Open AI Gym implementation using Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/
In fact I previously ran it locally and PPO solved the problem within 10 minutes of training with max reward of about 200.
You can run PPO or DQN right now on the Open AI Gym implementation using Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/
In fact I previously ran it locally and PPO solved the problem within 10 minutes of training with max reward of about 200.