Wrappers

class rl_zoo3.wrappers.ActionNoiseWrapper(env, noise_std=0.1)[source]

Add gaussian noise to the action (without telling the agent), to test the robustness of the control.

Parameters:
  • env

  • noise_std – Standard deviation of the noise

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Parameters:

action (ndarray) –

Return type:

Tuple[ObsType, SupportsFloat, bool, bool, Dict[str, Any]]

class rl_zoo3.wrappers.ActionSmoothingWrapper(env, smoothing_coef=0.0)[source]

Smooth the action using exponential moving average.

Parameters:
  • env

  • smoothing_coef – Smoothing coefficient (0 no smoothing, 1 very smooth)

reset(seed=None, options=None)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Parameters:
  • seed (int | None) –

  • options (dict | None) –

Return type:

Tuple[Tuple | Dict[str, Any] | ndarray | int, Dict]

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Return type:

Tuple[Tuple | Dict[str, Any] | ndarray | int, float, bool, bool, Dict]

class rl_zoo3.wrappers.DelayedRewardWrapper(env, delay=10)[source]

Delay the reward by delay steps, it makes the task harder but more realistic. The reward is accumulated during those steps.

Parameters:
  • env

  • delay – Number of steps the reward should be delayed.

reset(seed=None, options=None)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Parameters:
  • seed (int | None) –

  • options (dict | None) –

Return type:

Tuple[Tuple | Dict[str, Any] | ndarray | int, Dict]

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Return type:

Tuple[Tuple | Dict[str, Any] | ndarray | int, float, bool, bool, Dict]

class rl_zoo3.wrappers.FrameSkip(env, skip=4)[source]

Return only every skip-th frame (frameskipping)

Parameters:
  • env – the environment

  • skip – number of skip-th frame

step(action)[source]

Step the environment with the given action Repeat action, sum reward.

Parameters:

action – the action

Returns:

observation, reward, terminated, truncated, information

Return type:

Tuple[Tuple | Dict[str, Any] | ndarray | int, float, bool, bool, Dict]

class rl_zoo3.wrappers.HistoryWrapper(env, horizon=2)[source]

Stack past observations and actions to give an history to the agent.

Parameters:
  • env

  • horizon – Number of steps to keep in the history.

reset(seed=None, options=None)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Parameters:
  • seed (int | None) –

  • options (dict | None) –

Return type:

Tuple[ndarray, Dict]

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Return type:

Tuple[ndarray, SupportsFloat, bool, bool, Dict]

class rl_zoo3.wrappers.HistoryWrapperObsDict(env, horizon=2)[source]

History Wrapper for dict observation.

Parameters:
  • env

  • horizon – Number of steps to keep in the history.

reset(seed=None, options=None)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Parameters:
  • seed (int | None) –

  • options (dict | None) –

Return type:

Tuple[Dict[str, ndarray], Dict]

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Return type:

Tuple[Dict[str, ndarray], SupportsFloat, bool, bool, Dict]

class rl_zoo3.wrappers.MaskVelocityWrapper(env)[source]

Gym environment observation wrapper used to mask velocity terms in observations. The intention is the make the MDP partially observable. Adapted from https://github.com/LiuWenlin595/FinalProject.

Parameters:

env – Gym environment

observation(observation)[source]

Returns a modified observation.

Args:

observation: The env observation

Returns:

The modified observation

Parameters:

observation (ndarray) –

Return type:

ndarray

class rl_zoo3.wrappers.TruncatedOnSuccessWrapper(env, reward_offset=0.0, n_successes=1)[source]

Reset on success and offsets the reward. Useful for GoalEnv.

reset(seed=None, options=None)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Parameters:
  • seed (int | None) –

  • options (dict | None) –

Return type:

Tuple[Tuple | Dict[str, Any] | ndarray | int, Dict]

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Return type:

Tuple[Tuple | Dict[str, Any] | ndarray | int, float, bool, bool, Dict]