Configuration
Hyperparameter YAML syntax
The syntax used in hyperparameters/algo_name.yml for setting
hyperparameters (likewise the syntax to overwrite
hyperparameters
on the cli) may be specialized if the argument is a function. See
examples in the hyperparameters/ directory. For example:
Specify a linear schedule for the learning rate:
learning_rate: lin_0.012486195510232303
Specify a different activation function for the network:
policy_kwargs: "dict(activation_fn=nn.ReLU)"
For a custom policy:
policy: my_package.MyCustomPolicy # for instance stable_baselines3.ppo.MlpPolicy
Env Normalization
In the hyperparameter file, normalize: True means that the training
environment will be wrapped in a
VecNormalize
wrapper.
Normalization
uses the
default parameters of VecNormalize, with the exception of gamma
which is set to match that of the agent. This can be
overridden
using the appropriate hyperparameters/algo_name.yml, e.g.
normalize: "{'norm_obs': True, 'norm_reward': False}"
Env Wrappers
You can specify in the hyperparameter config one or more wrapper to use around the environment:
for one wrapper:
env_wrapper: gym_minigrid.wrappers.FlatObsWrapper
for multiple, specify a list:
env_wrapper:
- rl_zoo3.wrappers.TruncatedOnSuccessWrapper:
reward_offset: 1.0
- sb3_contrib.common.wrappers.TimeFeatureWrapper
Note that you can easily specify parameters too.
By default, the environment is wrapped with a Monitor wrapper to
record episode statistics. You can specify arguments to it using
monitor_kwargs parameter to log additional data. That data must be
present in the info dictionary at the last step of each episode.
For instance, for recording success with goal envs
(e.g. FetchReach-v1):
monitor_kwargs: dict(info_keywords=('is_success',))
or recording final x position with Ant-v3:
monitor_kwargs: dict(info_keywords=('x_position',))
Note: for known GoalEnv like FetchReach,
info_keywords=('is_success',) is actually the default.
You can also specify environment keyword arguments with:
env_kwargs:
gravity: 0.0
VecEnvWrapper
You can specify which VecEnvWrapper to use in the config, the same
way as for env wrappers (see above), using the vec_env_wrapper key:
For instance:
vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor
Note: VecNormalize is supported separately using normalize
keyword, and VecFrameStack has a dedicated keyword frame_stack.
Callbacks
Following the same syntax as env wrappers, you can also add custom callbacks to use during training.
callback:
- rl_zoo3.callbacks.ParallelTrainCallback:
gradient_steps: 256
Default Hyperparameters
You can use a default entry in your hyperparameter YAML file to provide fallback hyperparameters for environments that don’t have specific entries.
This is useful when training on environments for which you don’t have tuned hyperparameters.
The default hyperparameters will be used when:
The environment is not explicitly listed in the config file
The environment is not an Atari game (which uses the
atarientry)
Example:
# Specific hyperparameters for CartPole-v1
CartPole-v1:
n_envs: 8
n_timesteps: !!float 1e5
policy: 'MlpPolicy'
learning_rate: 1e-3
# Fallback hyperparameters for any other environment
default:
n_envs: 4
n_timesteps: !!float 1e6
policy: 'MlpPolicy'
When training on an environment not explicitly listed, the Zoo will print Using 'default' hyperparameters and apply the default settings.