Wrappers¶
- class rl_zoo3.wrappers.ActionNoiseWrapper(env, noise_std=0.1)[source]¶
Add gaussian noise to the action (without telling the agent), to test the robustness of the control.
- Parameters:
env (
Env
) –noise_std (
float
) – Standard deviation of the noise
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- class rl_zoo3.wrappers.ActionSmoothingWrapper(env, smoothing_coef=0.0)[source]¶
Smooth the action using exponential moving average.
- Parameters:
env (
Env
) –smoothing_coef (
float
) – Smoothing coefficient (0 no smoothing, 1 very smooth)
- reset()[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- class rl_zoo3.wrappers.DelayedRewardWrapper(env, delay=10)[source]¶
Delay the reward by delay steps, it makes the task harder but more realistic. The reward is accumulated during those steps.
- Parameters:
env (
Env
) –delay (
int
) – Number of steps the reward should be delayed.
- reset()[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- class rl_zoo3.wrappers.DoneOnSuccessWrapper(env, reward_offset=0.0, n_successes=1)[source]¶
Reset on success and offsets the reward. Useful for GoalEnv.
- reset()[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- class rl_zoo3.wrappers.FrameSkip(env, skip=4)[source]¶
Return only every
skip
-th frame (frameskipping)- Parameters:
env (
Env
) – the environmentskip (
int
) – number ofskip
-th frame
- reset()[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- class rl_zoo3.wrappers.HistoryWrapper(env, horizon=2)[source]¶
Stack past observations and actions to give an history to the agent.
- Parameters:
env (
Env
) –
:param horizon:Number of steps to keep in the history.
- reset()[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- class rl_zoo3.wrappers.HistoryWrapperObsDict(env, horizon=2)[source]¶
History Wrapper for dict observation.
- Parameters:
env (
Env
) –horizon (
int
) – Number of steps to keep in the history.
- reset()[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- class rl_zoo3.wrappers.MaskVelocityWrapper(env)[source]¶
Gym environment observation wrapper used to mask velocity terms in observations. The intention is the make the MDP partially observable. Adapted from https://github.com/LiuWenlin595/FinalProject.
- Parameters:
env (
Env
) – Gym environment