(train)= # Train an Agent ## Basic Usage The hyperparameters for each environment are defined in `hyperparameters/algo_name.yml`. :::{note} Once RL Zoo3 is installed, you can run `python -m rl_zoo3.train` from any folder, which is equivalent to `python train.py`. ::: If the environment exists in this file, then you can train an agent using: ``` python train.py --algo algo_name --env env_id ``` :::{note} You can use `-P` (`--progress`) option to display a progress bar. ::: ## Custom Config File Use a custom config file when it is a YAML file that contains an `env_id` entry: ``` python train.py --algo algo_name --env env_id --conf-file my_yaml.yml ``` You can also use a python file that contains a dictionary called `hyperparams` with an entry for each `env_id`. (see `hyperparams/python/ppo_config_example.py` for an example) ``` # You can pass a path to a python file python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams/python/ppo_config_example.py # Or pass a path to a file from a module (for instance my_package.my_file) python train.py --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams.python.ppo_config_example ``` The advantage of this approach is that you can specify arbitrary python dictionaries and ensure that all their dependencies are imported in the config file itself. ## Tensorboard, Checkpoints, Evaluation For example (with tensorboard support): ``` python train.py --algo ppo --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/ ``` Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env): ``` python train.py --algo sac --env AntBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1 ``` Save a checkpoint of the agent every 100000 steps: ``` python train.py --algo td3 --env AntBulletEnv-v0 --save-freq 100000 ``` ## Resume Training Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps): ``` python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i rl-trained-agents/a2c/BreakoutNoFrameskip-v4_1/BreakoutNoFrameskip-v4.zip -n 5000 ``` ## Save Replay Buffer When using off-policy algorithms, you can also **save the replay buffer** after training: ``` python train.py --algo sac --env Pendulum-v1 --save-replay-buffer ``` It will be automatically loaded if present when continuing training. ## Env keyword arguments You can specify keyword arguments to pass to the env constructor in the command line, using `--env-kwargs`: ``` python enjoy.py --algo ppo --env MountainCar-v0 --env-kwargs goal_velocity:10 ``` ## Overwrite hyperparameters You can easily overwrite hyperparameters in the command line, using `--hyperparams`: ``` python train.py --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])" ``` Note: if you want to pass a string, you need to escape it like that: `my_string:"'value'"`