Configuration

Hyperparameter YAML syntax

The syntax used in hyperparameters/algo_name.yml for setting hyperparameters (likewise the syntax to overwrite hyperparameters on the cli) may be specialized if the argument is a function. See examples in the hyperparameters/ directory. For example:

Specify a linear schedule for the learning rate:

learning_rate: lin_0.012486195510232303

Specify a different activation function for the network:

policy_kwargs: "dict(activation_fn=nn.ReLU)"

For a custom policy:

policy: my_package.MyCustomPolicy  # for instance stable_baselines3.ppo.MlpPolicy

Env Normalization

In the hyperparameter file, normalize: True means that the training environment will be wrapped in a VecNormalize wrapper.

Normalization uses the default parameters of VecNormalize, with the exception of gamma which is set to match that of the agent. This can be overridden using the appropriate hyperparameters/algo_name.yml, e.g.

normalize: "{'norm_obs': True, 'norm_reward': False}"

Env Wrappers

You can specify in the hyperparameter config one or more wrapper to use around the environment:

for one wrapper:

env_wrapper: gym_minigrid.wrappers.FlatObsWrapper

for multiple, specify a list:

env_wrapper:
    - rl_zoo3.wrappers.TruncatedOnSuccessWrapper:
        reward_offset: 1.0
    - sb3_contrib.common.wrappers.TimeFeatureWrapper

Note that you can easily specify parameters too.

By default, the environment is wrapped with a Monitor wrapper to record episode statistics. You can specify arguments to it using monitor_kwargs parameter to log additional data. That data must be present in the info dictionary at the last step of each episode.

For instance, for recording success with goal envs (e.g. FetchReach-v1):

monitor_kwargs: dict(info_keywords=('is_success',))

or recording final x position with Ant-v3:

monitor_kwargs: dict(info_keywords=('x_position',))

Note: for known GoalEnv like FetchReach, info_keywords=('is_success',) is actually the default.

You can also specify environment keyword arguments with:

env_kwargs:
  gravity: 0.0

VecEnvWrapper

You can specify which VecEnvWrapper to use in the config, the same way as for env wrappers (see above), using the vec_env_wrapper key:

For instance:

vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor

Note: VecNormalize is supported separately using normalize keyword, and VecFrameStack has a dedicated keyword frame_stack.

Callbacks

Following the same syntax as env wrappers, you can also add custom callbacks to use during training.

callback:
  - rl_zoo3.callbacks.ParallelTrainCallback:
      gradient_steps: 256

Default Hyperparameters

You can use a default entry in your hyperparameter YAML file to provide fallback hyperparameters for environments that don’t have specific entries. This is useful when training on environments for which you don’t have tuned hyperparameters.

The default hyperparameters will be used when:

The environment is not explicitly listed in the config file
The environment is not an Atari game (which uses the atari entry)

Example:

# Specific hyperparameters for CartPole-v1
CartPole-v1:
  n_envs: 8
  n_timesteps: !!float 1e5
  policy: 'MlpPolicy'
  learning_rate: 1e-3

# Fallback hyperparameters for any other environment
default:
  n_envs: 4
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'

When training on an environment not explicitly listed, the Zoo will print Using 'default' hyperparameters and apply the default settings.