Configuration

Hyperparameter yaml syntax

The syntax used in hyperparameters/algo_name.yml for setting hyperparameters (likewise the syntax to overwrite hyperparameters on the cli) may be specialized if the argument is a function. See examples in the hyperparameters/ directory. For example:

Specify a linear schedule for the learning rate:

learning_rate: lin_0.012486195510232303

Specify a different activation function for the network:

policy_kwargs: "dict(activation_fn=nn.ReLU)"

For a custom policy:

policy: my_package.MyCustomPolicy  # for instance stable_baselines3.ppo.MlpPolicy

Env Normalization

In the hyperparameter file, normalize: True means that the training environment will be wrapped in a VecNormalize wrapper.

Normalization uses the default parameters of VecNormalize, with the exception of gamma which is set to match that of the agent. This can be overridden using the appropriate hyperparameters/algo_name.yml, e.g.

normalize: "{'norm_obs': True, 'norm_reward': False}"

Env Wrappers

You can specify in the hyperparameter config one or more wrapper to use around the environment:

for one wrapper:

env_wrapper: gym_minigrid.wrappers.FlatObsWrapper

for multiple, specify a list:

env_wrapper:
    - rl_zoo3.wrappers.TruncatedOnSuccessWrapper:
        reward_offset: 1.0
    - sb3_contrib.common.wrappers.TimeFeatureWrapper

Note that you can easily specify parameters too.

By default, the environment is wrapped with a Monitor wrapper to record episode statistics. You can specify arguments to it using monitor_kwargs parameter to log additional data. That data must be present in the info dictionary at the last step of each episode.

For instance, for recording success with goal envs (e.g. FetchReach-v1):

monitor_kwargs: dict(info_keywords=('is_success',))

or recording final x position with Ant-v3:

monitor_kwargs: dict(info_keywords=('x_position',))

Note: for known GoalEnv like FetchReach, info_keywords=('is_success',) is actually the default.

VecEnvWrapper

You can specify which VecEnvWrapper to use in the config, the same way as for env wrappers (see above), using the vec_env_wrapper key:

For instance:

vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor

Note: VecNormalize is supported separately using normalize keyword, and VecFrameStack has a dedicated keyword frame_stack.

Callbacks

Following the same syntax as env wrappers, you can also add custom callbacks to use during training.

callback:
  - rl_zoo3.callbacks.ParallelTrainCallback:
      gradient_steps: 256