Changelog
See https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/CHANGELOG.md
Release 2.8.0 (2026-04-01)
Breaking Changes
Upgraded to SB3 >= 2.8.0
Removed support for Python 3.9, please upgrade to Python >= 3.10
Set
strict=Truefor every call tozip(...)
New Features
Added official support for Python 3.13
Allow to specify
env_kwargsin the hyperparam configAllow to use default hyperparameters for any environment
Save training command in Weights & Biases (Wandb)
Save training command and default hyperparameters as study attributes
Bug fixes
Documentation
Switched to Markdown documentation (using MyST parser)
Other
Fixed unused variables in
plot_from_file.py
Release 2.7.0 (2025-07-25)
Breaking Changes
Upgraded to SB3 >= 2.7.0
linear_schedulenow returns aSimpleLinearScheduleobject for better portabilityRenamed
LunarLander-v2toLunarLander-v3in hyperparametersRenamed
CarRacing-v2toCarRacing-v3in hyperparameters
New Features
Added Gymnasium v1.2 support
Bug fixes
Docker GPU images are now working again
Use
ConstantSchedule, andSimpleLinearScheduleinstead ofconstant_fnandlinear_scheduleFixed
CarRacing-v3hyperparameters for newer Gymnasium version
Release 2.6.0 (2025-03-24)
Breaking Changes
Upgraded to SB3 >= 2.6.0
Refactored hyperparameter optimization. The Optuna Journal storage backend is now supported (recommended default) and you can easily load tuned hyperparameter via the new
--trial-idargument oftrain.py.
For example, optimize using the journal storage:
python train.py --algo ppo --env Pendulum-v1 -n 40000 --study-name demo --storage logs/demo.log --sampler tpe --n-evaluations 2 --optimize --no-optim-plots
Visualize live using optuna-dashboard
optuna-dashboard logs/demo.log
Load hyperparameters from trial number 21 and train an agent with it:
python train.py --algo ppo --env Pendulum-v1 --study-name demo --storage logs/demo.log --trial-id 21
New Features
Save the exact command line used to launch a training
Added support for special vectorized env (e.g. Brax, IsaacSim) by allowing to override the
VecEnvclass use to instantiate the env in theExperimentManagerAllow to disable auto-logging by passing
--log-interval -2(useful when logging things manually)Added Gymnasium v1.1 support
Bug fixes
Fixed use of old HF api in
get_hf_trained_models()
Documentation
Other
scripts/parse_study.pyis now deprecated because of the new hyperparameter optimization scripts
Release 2.5.0 (2025-01-27)
Breaking Changes
Upgraded to Pytorch >= 2.3.0
Upgraded to SB3 >= 2.5.0
New Features
Added support for Numpy v2
Added support for specifying callbacks and env wrapper as python object in python config files (instead of string)
Bug fixes
Documentation
Other
Updated Dockerfile
Release 2.4.0 (2024-11-18)
New algorithm: CrossQ, Gymnasium v1.0 support, and better defaults for SAC/TQC on Swimmer-v4 env
Breaking Changes
Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results) (@JacobHA) W&B report
Upgraded to SB3 >= 2.4.0
Renamed
LunarLander-v2toLunarLander-v3in hyperparameters
New Features
Added
CrossQhyperparameters for SB3-contrib (@danielpalen)Added Gymnasium v1.0 support
Bug fixes
Replaced deprecated
huggingface_hub.Repositorywhen pushing to Hugging Face Hub by the recommendedHfApi(see https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http) (@cochaviz)
Documentation
Other
Updated PyTorch version to 2.4.1 in the CI
Switched to uv to download packages faster on GitHub CI
Release 2.3.0 (2024-03-31)
Breaking Changes
Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
Upgraded to SB3 >= 2.3.0
Other
Added test dependencies to
setup.py(@power-edge)Simplify dependencies of
requirements.txt(remove duplicates fromsetup.py)
Release 2.2.1 (2023-11-17)
Breaking Changes
Removed
gymdependency, the package is still required for some pretrained agents.Upgraded to SB3 >= 2.2.1
Upgraded to Huggingface-SB3 >= 3.0
Upgraded to pytablewriter >= 1.0
New Features
Added
--eval-env-kwargstotrain.py(@Quentin18)Added
ppo_lstmto hyperparams_opt.py (@technocrat13)
Bug fixes
Upgraded to
pybullet_envs_gymnasium>=0.4.0Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
Documentation
Other
Updated docker image, removed support for X server
Replaced deprecated
optuna.suggest_uniform(...)byoptuna.suggest_float(..., low=..., high=...)Switched to ruff for sorting imports
Updated tests to use
shlex.split()Fixed
rl_zoo3/hyperparams_opt.pytype hintsFixed
rl_zoo3/exp_manager.pytype hints
Release 2.1.0 (2023-08-17)
Breaking Changes
Dropped python 3.7 support
SB3 now requires PyTorch 1.13+
Upgraded to SB3 >= 2.1.0
Upgraded to Huggingface-SB3 >= 2.3
Upgraded to Optuna >= 3.0
Upgraded to cloudpickle >= 2.2.1
New Features
Added python 3.11 support
Bug fixes
Documentation
Other
Release 2.0.0 (2023-06-22)
Gymnasium support
Warning Stable-Baselines3 (SB3) v2.0.0 will be the last one supporting python 3.7
Breaking Changes
Fixed bug in HistoryWrapper, now returns the correct obs space limits
Upgraded to SB3 >= 2.0.0
Upgraded to Huggingface-SB3 >= 2.2.5
Upgraded to Gym API 0.26+, RL Zoo3 doesn’t work anymore with Gym 0.21
New Features
Added Gymnasium support
Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
Bug fixes
Renamed
CarRacing-v1toCarRacing-v2in hyperparametersHuggingface push to hub now accepts a
--n-timestepsargument to adjust the length of the videoFixed
record_videosteps (before it was stepping in a closed env)
Release 1.8.0 (2023-04-07)
New Documentation, Multi-Env HerReplayBuffer
Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
Breaking Changes
Upgraded to SB3 >= 1.8.0
Upgraded to new
HerReplayBufferimplementation that supports multiple envsRemoved
TimeFeatureWrapperfor Panda and Fetch envs, as the new replay buffer should handle timeout.
New Features
Tuned hyperparameters for RecurrentPPO on Swimmer
Documentation is now built using Sphinx and hosted on read the doc
Added hyperparameters pre-trained agents for PPO on 11 MiniGrid envs
Bug fixes
Set
highway-envversion to 1.5 andsetuptools tov65.5 for the CIRemoved
use_auth_tokenfor push to hub utilReverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see https://github.com/openai/gym/pull/1304)
Fixed
gym-minigridpolicy (fromMlpPolicytoMultiInputPolicy)
Documentation
Other
Added support for
ruff(fast alternative to flake8) in the MakefileRemoved Gitlab CI file
Replaced deprecated
optuna.suggest_loguniform(...)byoptuna.suggest_float(..., log=True)Switched to
ruffandpyproject.tomlRemoved
online_samplingandmax_episode_lengthargument when usingHerReplayBuffer
Release 1.7.0 (2023-01-10)
SB3 v1.7.0, added support for python config files
Breaking Changes
--yaml-fileargument was renamed to-conf(--conf-file) as now python file are supported tooUpgraded to SB3 >= 1.7.0 (changed
net_arch=[dict(pi=.., vf=..)]tonet_arch=dict(pi=.., vf=..))
New Features
Specifying custom policies in yaml file is now supported (@Rick-v-E)
Added
monitor_kwargsparameterHandle the
env_kwargsofrender:Trueunder the hood for panda-gym v1 envs inenjoyreplay to match visualzation behavior of other envsAdded support for python config file
Tuned hyperparameters for PPO on Swimmer
Added
-tags/--wandb-tagsargument totrain.pyto add tags to the wandb runAdded a sb3 version tag to the wandb run
Bug fixes
Allow
python -m rl_zoo3.clito be called directlyFixed a bug where custom environments were not found despite passing
--gym-packagewhen using subprocessesFixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv
Documentation
Other
scripts/plot_train.pyplots models such that newer models appear on top of older ones.Added additional type checking using mypy
Standardized the use of
from gym import spaces
Release 1.6.3 (2022-10-13)
Breaking Changes
New Features
Bug fixes
python3 -m rl_zoo3.trainnow works as expected
Documentation
Added instructions and examples on passing arguments in an interactive session (@richter43)
Other
Used issue forms instead of issue templates
Release 1.6.2.post2 (2022-10-10)
Breaking Changes
RL Zoo is now a python package
low pass filter was removed
Upgraded to Stable-Baselines3 (SB3) >= 1.6.2
Upgraded to sb3-contrib >= 1.6.2
Use now built-in SB3
ProgressBarCallbackinstead ofTQDMCallback
New Features
RL Zoo cli:
rl_zoo3 trainandrl_zoo3 enjoy
Bug fixes
Documentation
Other
Release 1.6.1 (2022-09-30)
Progress bar and custom yaml file
Breaking Changes
Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
Upgraded to sb3-contrib >= 1.6.1
New Features
Added
--yaml-fileargument option fortrain.pyto read hyperparameters from custom yaml files (@JohannesUl)
Bug fixes
Added
custom_objectparameter on record_video.py (@Affonso-Gui)Changed
optimize_memory_usagetoFalsefor DQN/QR-DQN on record_video.py (@Affonso-Gui)In
ExperimentManager_maybe_normalizesettrainingtoFalsefor eval envs, to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
Added progress bar via the
-Pargument using tqdm and rich
Documentation
Other
Release 1.6.0 (2022-08-05)
RecurrentPPO (ppo_lstm) and Huggingface integration
Breaking Changes
Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
Updated default –eval-freq from 10k to 25k steps
Update default horizon to 2 for the
HistoryWrapperUpgrade to Stable-Baselines3 (SB3) >= 1.6.0
Upgrade to sb3-contrib >= 1.6.0
New Features
Support setting PyTorch’s device with thye
--deviceflag (@gregwar)Add
--max-total-trialsparameter to help with distributed optimization. (@ernestum)Added
vec_env_wrappersupport in the config (works the same asenv_wrapper)Added Huggingface hub integration
Added
RecurrentPPOsupport (akappo_lstm)Added autodownload for “official” sb3 models from the hub
Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
Added MsPacman models
Bug fixes
Fix
Reacher-v3name in PPO hyperparameter filePinned ale-py==0.7.4 until new SB3 version is released
Fix enjoy / record videos with LSTM policy
Fix bug with environments that have a slash in their name (@ernestum)
Changed
optimize_memory_usagetoFalsefor DQN/QR-DQN on Atari games, if you want to save RAM, you need to deactivatehandle_timeout_terminationin thereplay_buffer_kwargs
Documentation
Other
When pruner is set to
"none", useNopPrunerinstead of divertedMedianPruner(@qgallouedec)
Release 1.5.0 (2022-03-25)
Support for Weight and Biases experiment tracking
Breaking Changes
Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
Upgrade to sb3-contrib >= 1.5.0
Upgraded to gym 0.21
New Features
Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
Support experiment tracking via Weights and Biases via the
--trackflag (@vwxyzjn)Support tracking raw episodic stats via
RawStatisticsCallback(@vwxyzjn, see https://github.com/DLR-RM/rl-baselines3-zoo/pull/216)
Bug fixes
Policies saved during during optimization with distributed Optuna load on new systems (@jkterry)
Fixed script for recording video that was not up to date with the enjoy script
Documentation
Other
Release 1.4.0 (2022-01-19)
Breaking Changes
Dropped python 3.6 support
Upgrade to Stable-Baselines3 (SB3) >= 1.4.0
Upgrade to sb3-contrib >= 1.4.0
New Features
Added mujoco hyperparameters
Added MuJoCo pre-trained agents
Added script to parse best hyperparameters of an optuna study
Added TRPO support
Added ARS support and pre-trained agents
Bug fixes
Documentation
Replace front image
Other
Release 1.3.0 (2021-10-23)
rliable plots and bug fixes
WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.
Breaking Changes
Upgrade to panda-gym 1.1.1
Upgrade to Stable-Baselines3 (SB3) >= 1.3.0
Upgrade to sb3-contrib >= 1.3.0
New Features
Added support for using rliable for performance comparison
Bug fixes
Fix training with Dict obs and channel last images
Documentation
Other
Updated docker image
constrained gym version: gym>=0.17,<0.20
Better hyperparameters for A2C/PPO on Pendulum
Release 1.2.0 (2021-09-08)
Breaking Changes
Upgrade to Stable-Baselines3 (SB3) >= 1.2.0
Upgrade to sb3-contrib >= 1.2.0
New Features
Added support for Python 3.10
Bug fixes
Fix
--load-last-checkpoint(@SammyRamone)Fix
TypeErrorforgym.Envclass entry points inExperimentManager(@schuderer)Fix usage of callbacks during hyperparameter optimization (@SammyRamone)
Documentation
Other
Added python 3.9 to Github CI
Increased DQN replay buffer size for Atari games (@nikhilrayaprolu)
Release 1.1.0 (2021-07-01)
Breaking Changes
Upgrade to Stable-Baselines3 (SB3) >= 1.1.0
Upgrade to sb3-contrib >= 1.1.0
Add timeout handling (cf SB3 doc)
HERis now a replay buffer class and no more an algorithmRemoved
PlotNoiseRatioCallbackRemoved
PlotActionWrapperChanged
'lr'key in Optuna param dict to'learning_rate'so the dict can be directly passed to SB3 methods (@jkterry)
New Features
Add support for recording videos of best models and checkpoints (@mcres)
Add support for recording videos of training experiments (@mcres)
Add support for dictionary observations
Added experimental parallel training (with
utils.callbacks.ParallelTrainCallback)Added support for using multiple envs for evaluation
Added
--load-last-checkpointoption for the enjoy scriptSave Optuna study object at the end of hyperparameter optimization and plot the results (
plotlypackage required)Allow to pass multiple folders to
scripts/plot_train.pyFlag to save logs and optimal policies from each training run (@jkterry)
Bug fixes
Fixed video rendering for PyBullet envs on Linux
Fixed
get_latest_run_id()so it works in Windows too (@NicolasHaeffner)Fixed video record when using
HERreplay buffer
Documentation
Updated README (dict obs are now supported)
Other
Added
is_bullet()toExperimentManagerSimplify
close()for the enjoy scriptUpdated docker image to include latest black version
Updated TD3 Walker2D model (thanks @modanesh)
Fixed typo in plot title (@scottemmons)
Minimum cloudpickle version added to
requirements.txt(@amy12xx)Fixed atari-py version (ROM missing in newest release)
Updated
SACandTD3search spacesCleanup eval_freq documentation and variable name changes (@jkterry)
Add clarifying print statement when printing saved hyperparameters during optimization (@jkterry)
Clarify n_evaluations help text (@jkterry)
Simplified hyperparameters files making use of defaults
Added new TQC+HER agents
Add
panda-gymenvironments (@qgallouedec)
Release 1.0 (2021-03-17)
Breaking Changes
Upgrade to SB3 >= 1.0
Upgrade to sb3-contrib >= 1.0
New Features
Added 100+ trained agents + benchmark file
Add support for loading saved model under python 3.8+ (no retraining possible)
Added Robotics pre-trained agents (@sgillen)
Bug fixes
Bug fixes for
HERhandling action noiseFixed double reset bug with
HERand enjoy script
Documentation
Added doc about plotting scripts
Other
Updated
HERhyperparameters
Pre-Release 0.11.1 (2021-02-27)
Breaking Changes
Removed
LinearNormalActionNoiseEvaluation is now deterministic by default, except for Atari games
sb3_contribis now requiredTimeFeatureWrapperwas moved to the contrib repoReplaced old
plot_train.pyscript with updatedplot_training_success.pyRenamed
n_episodes_rollouttotrain_freqtuple to match latest version of SB3
New Features
Added option to choose which
VecEnvclass to use for multiprocessingAdded hyperparameter optimization support for
TQCAdded support for
QR-DQNfrom SB3 contrib
Bug fixes
Improved detection of Atari games
Fix potential bug in plotting script when there is not enough timesteps
Fixed a bug when using HER + DQN/TQC for hyperparam optimization
Documentation
Improved documentation (@cboettig)
Other
Refactored train script, now uses a
ExperimentManagerclassReplaced
make_envwith SB3 built-inmake_vec_envAdd more type hints (
utils/utils.pydone)Use f-strings when possible
Changed
PPOatari hyperparameters (removed vf clipping)Changed
A2Catari hyperparameters (eps value of the optimizer)Updated benchmark script
Updated hyperparameter optim search space (commented gSDE for A2C/PPO)
Updated
DQNhyperparameters for CartPoleDo not wrap channel-first image env (now natively supported by SB3)
Removed hack to log success rate
Simplify plot script
Pre-Release 0.10.0 (2020-10-28)
Breaking Changes
New Features
Added support for
HERAdded low-pass filter wrappers in
utils/wrappers.pyAdded
TQCsupport, implementation from sb3-contrib
Bug fixes
Fixed
TimeFeatureWrapperinferring max timestepsFixed
flatten_dict_observationsinutils/utils.pyfor recent Gym versions (@ManifoldFR)VecNormalizenow takesgammahyperparameter into accountFix loading of
VecNormalizewhen continuing training or using trained agent
Documentation
Other
Added tests for the wrappers
Updated plotting script
Release 0.8.0 (2020-08-04)
Breaking Changes
New Features
Distributed optimization (@SammyRamone)
Added
--load-checkpointsto load particular checkpointsAdded
--num-threadsto enjoy scriptAdded DQN support
Added saving of command line args (@SammyRamone)
Added DDPG support
Added version
Added
RMSpropTFLikesupport
Bug fixes
Fixed optuna warning (@SammyRamone)
Fixed
--save-freqwhich was not taking parallel env into accountSet
buffer_sizeto 1 when testing an Off-Policy model (e.g. SAC/DQN) to avoid memory allocation issueFixed seed at load time for
enjoy.pyNon-deterministic eval when doing hyperparameter optimization on atari games
Use ‘maximize’ for hyperparameter optimization (@SammyRamone)
Fixed a bug where reward where not normalized when doing hyperparameter optimization (@caburu)
Removed
nminibatchesfromppo.ymlforMountainCar-v0andAcrobot-v1. (@blurLake)Fixed
--save-replay-bufferto be compatible with latest SB3 versionClose environment at the end of training
Updated DQN hyperparameters on simpler gym env (due to an update in the implementation)
Documentation
Other
Reformat
enjoy.py,test_enjoy.py,test_hyperparams_opt.py,test_train.py,train.py,callbacks.py,hyperparams_opt.py,utils.py,wrappers.py(@salmannotkhan)Reformat
record_video.py(@salmannotkhan)Added codestyle check
make lintusing flake8Reformat
benchmark.py(@salmannotkhan)Added github ci
Fixes most linter warnings
Now using black and isort for auto-formatting
Updated plots