Configuration
Hyperparameter yaml syntax
The syntax used in hyperparameters/algo_name.yml
for setting
hyperparameters (likewise the syntax to overwrite
hyperparameters
on the cli) may be specialized if the argument is a function. See
examples in the hyperparameters/
directory. For example:
Specify a linear schedule for the learning rate:
learning_rate: lin_0.012486195510232303
Specify a different activation function for the network:
policy_kwargs: "dict(activation_fn=nn.ReLU)"
For a custom policy:
policy: my_package.MyCustomPolicy # for instance stable_baselines3.ppo.MlpPolicy
Env Normalization
In the hyperparameter file, normalize: True
means that the training
environment will be wrapped in a
VecNormalize
wrapper.
Normalization
uses the
default parameters of VecNormalize
, with the exception of gamma
which is set to match that of the agent. This can be
overridden
using the appropriate hyperparameters/algo_name.yml
, e.g.
normalize: "{'norm_obs': True, 'norm_reward': False}"
Env Wrappers
You can specify in the hyperparameter config one or more wrapper to use around the environment:
for one wrapper:
env_wrapper: gym_minigrid.wrappers.FlatObsWrapper
for multiple, specify a list:
env_wrapper:
- rl_zoo3.wrappers.TruncatedOnSuccessWrapper:
reward_offset: 1.0
- sb3_contrib.common.wrappers.TimeFeatureWrapper
Note that you can easily specify parameters too.
By default, the environment is wrapped with a Monitor
wrapper to
record episode statistics. You can specify arguments to it using
monitor_kwargs
parameter to log additional data. That data must be
present in the info dictionary at the last step of each episode.
For instance, for recording success with goal envs
(e.g. FetchReach-v1
):
monitor_kwargs: dict(info_keywords=('is_success',))
or recording final x position with Ant-v3
:
monitor_kwargs: dict(info_keywords=('x_position',))
Note: for known GoalEnv
like FetchReach
,
info_keywords=('is_success',)
is actually the default.
VecEnvWrapper
You can specify which VecEnvWrapper
to use in the config, the same
way as for env wrappers (see above), using the vec_env_wrapper
key:
For instance:
vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor
Note: VecNormalize
is supported separately using normalize
keyword, and VecFrameStack
has a dedicated keyword frame_stack
.
Callbacks
Following the same syntax as env wrappers, you can also add custom callbacks to use during training.
callback:
- rl_zoo3.callbacks.ParallelTrainCallback:
gradient_steps: 256