(tuning)=

# Hyperparameter Tuning

## Automated hyperparameter optimization

Blog post: [Automatic Hyperparameter Tuning - A Visual Guide](https://araffin.github.io/post/hyperparam-tuning/)

Video: <https://www.youtube.com/watch?v=AidFTOdGNFQ>

We use [Optuna](https://optuna.org/) for optimizing the
hyperparameters. Not all hyperparameters are tuned, and tuning enforces
certain default hyperparameter settings that may be different from the
official defaults. See
[rl_zoo3/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/rl_zoo3/hyperparams_opt.py)
for the current settings for each agent.

Hyperparameters not specified in
[rl_zoo3/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/rl_zoo3/hyperparams_opt.py)
are taken from the associated YAML file and fallback to the default
values of SB3 if not present.

Note: when using SuccessiveHalvingPruner (“halving”), you must specify
`--n-jobs > 1`

Budget of 1000 trials with a maximum of 50000 steps:

```
python train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
  --sampler tpe --pruner median
```

Distributed optimization using a shared database is also possible (see
the corresponding [Optuna
documentation](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/004_distributed.html)):

```
python train.py --algo ppo --env MountainCar-v0 -optimize --study-name test --storage logs/demo.log
```

Visualize live using [optuna-dashboard](https://optuna-dashboard.readthedocs.io/en/latest/getting-started.html)

```bash
optuna-dashboard logs/demo.log
```

Load hyperparameters from trial number 21 and train an agent with it:

```bash
python train.py --algo ppo --env MountainCar-v0 --study-name test --storage logs/demo.log --trial-id 21
```

The default budget for hyperparameter tuning is 500 trials and there is
one intermediate evaluation for pruning/early stopping per 100k time
steps.

## Hyperparameters search space

Note that the default hyperparameters used in the zoo when tuning are
not always the same as the defaults provided in
[stable-baselines3](https://stable-baselines3.readthedocs.io/en/master/modules/base.html).
Consult the latest source code to be sure of these settings. For
example:

- PPO tuning assumes a network architecture with `ortho_init = False`
  when tuning, though it is `True` by
  [default](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#ppo-policies).
  You can change that by updating
  [rl_zoo3/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/rl_zoo3/hyperparams_opt.py).
- Non-episodic rollout in TD3 and DDPG assumes
  `gradient_steps = train_freq` and so tunes only `train_freq` to
  reduce the search space.