(config)=

# Configuration

## Hyperparameter YAML syntax

The syntax used in `hyperparameters/algo_name.yml` for setting
hyperparameters (likewise the syntax to [overwrite
hyperparameters](https://github.com/DLR-RM/rl-baselines3-zoo#overwrite-hyperparameters)
on the cli) may be specialized if the argument is a function. See
examples in the `hyperparameters/` directory. For example:

- Specify a linear schedule for the learning rate:

```yaml
learning_rate: lin_0.012486195510232303
```

Specify a different activation function for the network:

```yaml
policy_kwargs: "dict(activation_fn=nn.ReLU)"
```

For a custom policy:

```yaml
policy: my_package.MyCustomPolicy  # for instance stable_baselines3.ppo.MlpPolicy
```

## Env Normalization

In the hyperparameter file, `normalize: True` means that the training
environment will be wrapped in a
[VecNormalize](https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_normalize.py#L13)
wrapper.

[Normalization
uses](https://github.com/DLR-RM/rl-baselines3-zoo/issues/64) the
default parameters of `VecNormalize`, with the exception of `gamma`
which is set to match that of the agent. This can be
[overridden](https://github.com/DLR-RM/rl-baselines3-zoo/blob/v0.10.0/hyperparams/sac.yml#L239)
using the appropriate `hyperparameters/algo_name.yml`, e.g.

```yaml
normalize: "{'norm_obs': True, 'norm_reward': False}"
```

## Env Wrappers

You can specify in the hyperparameter config one or more wrapper to use
around the environment:

for one wrapper:

```yaml
env_wrapper: gym_minigrid.wrappers.FlatObsWrapper
```

for multiple, specify a list:

```yaml
env_wrapper:
    - rl_zoo3.wrappers.TruncatedOnSuccessWrapper:
        reward_offset: 1.0
    - sb3_contrib.common.wrappers.TimeFeatureWrapper
```

Note that you can easily specify parameters too.

By default, the environment is wrapped with a `Monitor` wrapper to
record episode statistics. You can specify arguments to it using
`monitor_kwargs` parameter to log additional data. That data *must* be
present in the info dictionary at the last step of each episode.

For instance, for recording success with goal envs
(e.g. `FetchReach-v1`):

```yaml
monitor_kwargs: dict(info_keywords=('is_success',))
```

or recording final x position with `Ant-v3`:

```yaml
monitor_kwargs: dict(info_keywords=('x_position',))
```

Note: for known `GoalEnv` like `FetchReach`,
`info_keywords=('is_success',)` is actually the default.

You can also specify environment keyword arguments with:

```yaml
env_kwargs:
  gravity: 0.0
```

## VecEnvWrapper

You can specify which `VecEnvWrapper` to use in the config, the same
way as for env wrappers (see above), using the `vec_env_wrapper` key:

For instance:

```yaml
vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor
```

Note: `VecNormalize` is supported separately using `normalize`
keyword, and `VecFrameStack` has a dedicated keyword `frame_stack`.

## Callbacks

Following the same syntax as env wrappers, you can also add custom
callbacks to use during training.

```yaml
callback:
  - rl_zoo3.callbacks.ParallelTrainCallback:
      gradient_steps: 256
```

## Default Hyperparameters

You can use a `default` entry in your hyperparameter YAML file to provide fallback hyperparameters for environments that don't have specific entries.
This is useful when training on environments for which you don't have tuned hyperparameters.

The `default` hyperparameters will be used when:
1. The environment is not explicitly listed in the config file
2. The environment is not an Atari game (which uses the `atari` entry)

Example:

```yaml
# Specific hyperparameters for CartPole-v1
CartPole-v1:
  n_envs: 8
  n_timesteps: !!float 1e5
  policy: 'MlpPolicy'
  learning_rate: 1e-3

# Fallback hyperparameters for any other environment
default:
  n_envs: 4
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
```

When training on an environment not explicitly listed, the Zoo will print `Using 'default' hyperparameters` and apply the default settings.