Train an Agent

Basic Usage

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.

Note

Once RL Zoo3 is install, you can do python -m rl_zoo3.train from any folder, it is equivalent to python train.py

If the environment exists in this file, then you can train an agent using:

python train.py --algo algo_name --env env_id

Note

You can use -P (--progress) option to display a progress bar.

Custom Config File

Using a custom config file when it is a yaml file with a which contains a env_id entry:

python train.py --algo algo_name --env env_id --conf-file my_yaml.yml

You can also use a python file that contains a dictionary called hyperparams with an entry for each env_id. (see hyperparams/python/ppo_config_example.py for an example)

# You can pass a path to a python file
python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams/python/ppo_config_example.py
# Or pass a path to a file from a module (for instance my_package.my_file)
python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams.python.ppo_config_example

The advantage of this approach is that you can specify arbitrary python dictionaries and ensure that all their dependencies are imported in the config file itself.

Tensorboard, Checkpoints, Evaluation

For example (with tensorboard support):

python train.py --algo ppo --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/

Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env):

python train.py --algo sac --env AntBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1

Save a checkpoint of the agent every 100000 steps:

python train.py --algo td3 --env AntBulletEnv-v0 --save-freq 100000

Resume Training

Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):

python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i rl-trained-agents/a2c/BreakoutNoFrameskip-v4_1/BreakoutNoFrameskip-v4.zip -n 5000

Save Replay Buffer

When using off-policy algorithms, you can also save the replay buffer after training:

python train.py --algo sac --env Pendulum-v1 --save-replay-buffer

It will be automatically loaded if present when continuing training.

Env keyword arguments

You can specify keyword arguments to pass to the env constructor in the command line, using --env-kwargs:

python enjoy.py --algo ppo --env MountainCar-v0 --env-kwargs goal_velocity:10

Overwrite hyperparameters

You can easily overwrite hyperparameters in the command line, using --hyperparams:

python train.py --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])"

Note: if you want to pass a string, you need to escape it like that: my_string:"'value'"