.. _tuning:
=====================
Hyperparameter Tuning
=====================
Automated hyperparameter optimization
-------------------------------------
Blog post: `Automatic Hyperparameter Tuning - A Visual Guide `_
Video: https://www.youtube.com/watch?v=AidFTOdGNFQ
We use `Optuna `__ for optimizing the
hyperparameters. Not all hyperparameters are tuned, and tuning enforces
certain default hyperparameter settings that may be different from the
official defaults. See
`rl_zoo3/hyperparams_opt.py `__
for the current settings for each agent.
Hyperparameters not specified in
`rl_zoo3/hyperparams_opt.py `__
are taken from the associated YAML file and fallback to the default
values of SB3 if not present.
Note: when using SuccessiveHalvingPruner (“halving”), you must specify
``--n-jobs > 1``
Budget of 1000 trials with a maximum of 50000 steps:
::
python train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
--sampler tpe --pruner median
Distributed optimization using a shared database is also possible (see
the corresponding `Optuna
documentation `__):
::
python train.py --algo ppo --env MountainCar-v0 -optimize --study-name test --storage logs/demo.log
Visualize live using `optuna-dashboard `__
.. code:: bash
optuna-dashboard logs/demo.log
Load hyperparameters from trial number 21 and train an agent with it:
.. code:: bash
python train.py --algo ppo --env MountainCar-v0 --study-name test --storage logs/demo.log --trial-id 21
The default budget for hyperparameter tuning is 500 trials and there is
one intermediate evaluation for pruning/early stopping per 100k time
steps.
Hyperparameters search space
----------------------------
Note that the default hyperparameters used in the zoo when tuning are
not always the same as the defaults provided in
`stable-baselines3 `__.
Consult the latest source code to be sure of these settings. For
example:
- PPO tuning assumes a network architecture with ``ortho_init = False``
when tuning, though it is ``True`` by
`default `__.
You can change that by updating
`rl_zoo3/hyperparams_opt.py `__.
- Non-episodic rollout in TD3 and DDPG assumes
``gradient_steps = train_freq`` and so tunes only ``train_freq`` to
reduce the search space.