.. _train: ============== Train an Agent ============== Basic Usage ----------- The hyperparameters for each environment are defined in ``hyperparameters/algo_name.yml``. .. note:: Once RL Zoo3 is install, you can do ``python -m rl_zoo3.train`` from any folder, it is equivalent to ``python train.py`` If the environment exists in this file, then you can train an agent using: :: python train.py --algo algo_name --env env_id .. note:: You can use ``-P`` (``--progress``) option to display a progress bar. Custom Config File ------------------ Using a custom config file when it is a yaml file with a which contains a ``env_id`` entry: :: python train.py --algo algo_name --env env_id --conf-file my_yaml.yml You can also use a python file that contains a dictionary called `hyperparams` with an entry for each ``env_id``. (see ``hyperparams/python/ppo_config_example.py`` for an example) :: # You can pass a path to a python file python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams/python/ppo_config_example.py # Or pass a path to a file from a module (for instance my_package.my_file) python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams.python.ppo_config_example The advantage of this approach is that you can specify arbitrary python dictionaries and ensure that all their dependencies are imported in the config file itself. Tensorboard, Checkpoints, Evaluation ------------------------------------ For example (with tensorboard support): :: python train.py --algo ppo --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/ Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env): :: python train.py --algo sac --env AntBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1 Save a checkpoint of the agent every 100000 steps: :: python train.py --algo td3 --env AntBulletEnv-v0 --save-freq 100000 Resume Training --------------- Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps): :: python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i rl-trained-agents/a2c/BreakoutNoFrameskip-v4_1/BreakoutNoFrameskip-v4.zip -n 5000 Save Replay Buffer ------------------ When using off-policy algorithms, you can also **save the replay buffer** after training: :: python train.py --algo sac --env Pendulum-v1 --save-replay-buffer It will be automatically loaded if present when continuing training. Env keyword arguments --------------------- You can specify keyword arguments to pass to the env constructor in the command line, using ``--env-kwargs``: :: python enjoy.py --algo ppo --env MountainCar-v0 --env-kwargs goal_velocity:10 Overwrite hyperparameters ------------------------- You can easily overwrite hyperparameters in the command line, using ``--hyperparams``: :: python train.py --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])" Note: if you want to pass a string, you need to escape it like that: ``my_string:"'value'"``