Wrappers¶

class rl_zoo3.wrappers.ActionNoiseWrapper(env, noise_std=0.1)[source]¶

Add gaussian noise to the action (without telling the agent), to test the robustness of the control.

Parameters:

env (Env) –
noise_std (float) – Standard deviation of the noise

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rl_zoo3.wrappers.ActionSmoothingWrapper(env, smoothing_coef=0.0)[source]¶

Smooth the action using exponential moving average.

Parameters:

env (Env) –
smoothing_coef (float) – Smoothing coefficient (0 no smoothing, 1 very smooth)

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rl_zoo3.wrappers.DelayedRewardWrapper(env, delay=10)[source]¶

Delay the reward by delay steps, it makes the task harder but more realistic. The reward is accumulated during those steps.

Parameters:

env (Env) –
delay (int) – Number of steps the reward should be delayed.

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rl_zoo3.wrappers.DoneOnSuccessWrapper(env, reward_offset=0.0, n_successes=1)[source]¶

Reset on success and offsets the reward. Useful for GoalEnv.

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rl_zoo3.wrappers.FrameSkip(env, skip=4)[source]¶

Return only every skip-th frame (frameskipping)

Parameters:

env (Env) – the environment
skip (int) – number of skip-th frame

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Step the environment with the given action Repeat action, sum reward.

Parameters:: action (ndarray) – the action
Returns:: observation, reward, done, information

class rl_zoo3.wrappers.HistoryWrapper(env, horizon=2)[source]¶

Stack past observations and actions to give an history to the agent.

Parameters:: env (Env) –

:param horizon:Number of steps to keep in the history.

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rl_zoo3.wrappers.HistoryWrapperObsDict(env, horizon=2)[source]¶

History Wrapper for dict observation.

Parameters:

env (Env) –
horizon (int) – Number of steps to keep in the history.

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rl_zoo3.wrappers.MaskVelocityWrapper(env)[source]¶

Gym environment observation wrapper used to mask velocity terms in observations. The intention is the make the MDP partially observable. Adapted from https://github.com/LiuWenlin595/FinalProject.

Parameters:: env (Env) – Gym environment