Hyperparameter Tuning

Hyperparameter Tuning

We use Optuna for optimizing the hyperparameters. Not all hyperparameters are tuned, and tuning enforces certain default hyperparameter settings that may be different from the official defaults. See rl_zoo3/hyperparams_opt.py for the current settings for each agent.

Hyperparameters not specified in rl_zoo3/hyperparams_opt.py are taken from the associated YAML file and fallback to the default values of SB3 if not present.

Note: when using SuccessiveHalvingPruner (“halving”), you must specify --n-jobs > 1

Budget of 1000 trials with a maximum of 50000 steps:

python train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
  --sampler tpe --pruner median

Distributed optimization using a shared database is also possible (see the corresponding Optuna documentation):

python train.py --algo ppo --env MountainCar-v0 -optimize --study-name test --storage sqlite:///example.db

Print and save best hyperparameters of an Optuna study:

python scripts/parse_study.py -i path/to/study.pkl --print-n-best-trials 10 --save-n-best-hyperparameters 10

The default budget for hyperparameter tuning is 500 trials and there is one intermediate evaluation for pruning/early stopping per 100k time steps.

Hyperparameters search space

Note that the default hyperparameters used in the zoo when tuning are not always the same as the defaults provided in stable-baselines3. Consult the latest source code to be sure of these settings. For example:

  • PPO tuning assumes a network architecture with ortho_init = False when tuning, though it is True by default. You can change that by updating rl_zoo3/hyperparams_opt.py.

  • Non-episodic rollout in TD3 and DDPG assumes gradient_steps = train_freq and so tunes only train_freq to reduce the search space.

When working with continuous actions, we recommend to enable gSDE by uncommenting lines in rl_zoo3/hyperparams_opt.py.