class rl_zoo3.callbacks.ParallelTrainCallback(gradient_steps=100, verbose=0, sleep_time=0.0)[source]

Callback to explore (collect experience) and train (do gradient steps) at the same time using two separate threads. Normally used with off-policy algorithms and train_freq=(1, “episode”).

TODO: - blocking mode: wait for the model to finish updating the policy before collecting new experience at the end of a rollout - force sync mode: stop training to update to the latest policy for collecting new experience

  • gradient_steps (int) – Number of gradient steps to do before sending the new policy

  • verbose (int) – Verbosity level

  • sleep_time (float) – Limit the fps in the thread collecting experience.

class rl_zoo3.callbacks.RawStatisticsCallback(verbose=0)[source]

Callback used for logging raw episode data (return and episode length).

class rl_zoo3.callbacks.SaveVecNormalizeCallback(save_freq, save_path, name_prefix=None, verbose=0)[source]

Callback for saving a VecNormalize wrapper every save_freq steps

  • save_freq (int) – (int)

  • save_path (str) – (str) Path to the folder where VecNormalize will be saved, as vecnormalize.pkl

  • name_prefix (str | None) – (str) Common prefix to the saved VecNormalize, if None (default) only one file will be kept.

  • verbose (int) –

class rl_zoo3.callbacks.TrialEvalCallback(eval_env, trial, n_eval_episodes=5, eval_freq=10000, deterministic=True, verbose=0, best_model_save_path=None, log_path=None)[source]

Callback used for evaluating and reporting a trial.

  • eval_env (VecEnv) –

  • trial (Trial) –

  • n_eval_episodes (int) –

  • eval_freq (int) –

  • deterministic (bool) –

  • verbose (int) –

  • best_model_save_path (str | None) –

  • log_path (str | None) –