Train an Agent

Basic Usage

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.

Note

Once RL Zoo3 is install, you can do python -m rl_zoo3.train from any folder, it is equivalent to python train.py

If the environment exists in this file, then you can train an agent using:

python train.py --algo algo_name --env env_id

Note

You can use -P (--progress) option to display a progress bar.

Custom Config File

Using a custom config file when it is a yaml file with a which contains a env_id entry:

python train.py --algo algo_name --env env_id --conf-file my_yaml.yml

You can also use a python file that contains a dictionary called hyperparams with an entry for each env_id. (see hyperparams/python/ppo_config_example.py for an example)

# You can pass a path to a python file
python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams/python/ppo_config_example.py
# Or pass a path to a file from a module (for instance my_package.my_file)
python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams.python.ppo_config_example

The advantage of this approach is that you can specify arbitrary python dictionaries and ensure that all their dependencies are imported in the config file itself.

Tensorboard, Checkpoints, Evaluation

For example (with tensorboard support):

python train.py --algo ppo --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/

Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env):

python train.py --algo sac --env AntBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1

Save a checkpoint of the agent every 100000 steps:

python train.py --algo td3 --env AntBulletEnv-v0 --save-freq 100000

Resume Training

Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):

python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i rl-trained-agents/a2c/BreakoutNoFrameskip-v4_1/BreakoutNoFrameskip-v4.zip -n 5000

Save Replay Buffer

When using off-policy algorithms, you can also save the replay buffer after training:

python train.py --algo sac --env Pendulum-v1 --save-replay-buffer

It will be automatically loaded if present when continuing training.

Env keyword arguments

You can specify keyword arguments to pass to the env constructor in the command line, using --env-kwargs:

python enjoy.py --algo ppo --env MountainCar-v0 --env-kwargs goal_velocity:10

Overwrite hyperparameters

You can easily overwrite hyperparameters in the command line, using --hyperparams:

python train.py --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])"

Note: if you want to pass a string, you need to escape it like that: my_string:"'value'"