Releases · takuseno/d3rlpy

02 Mar 12:16

takuseno

v2.8.1

4e7bba7

Release v2.8.1 Latest

Latest

Bugfix

Pin Gymnasium version at 1.0.0 to prevent version mismatch errors between gymnasium and gymnasium-robotics.

Enhancement

maze2d datasets have been supported.

Assets 2

24 Feb 08:00

takuseno

v2.8.0

3653481

Release v2.8.0

New algorithms

PRDC (thanks, @liyc-ai )
QDT (thanks, @takuyamagata )
TACR

Enhancement

Health check is updated to check if PyTorch version is 2.5.0 or later.
Shimmy version has been upgraded.
Minari version has been upgraded.

Bugfix

Model loading error caused by mismatched optimizer data has been fixed (thanks, @hasan-yaman )
Fix map_location to support loading models trained with GPU onto CPU.
Fix Adroit dataset support.

Contributors

hasan-yaman, takuyamagata, and liyc-ai

Assets 2

04 Nov 03:09

takuseno

v2.7.0

8b7507e

Release v2.7.0

Breaking changes

Dependency

⚠️ This release updates the following dependencies.

Python 3.9 or later
PyTorch v2.5.0 or later

OptimizerFactory

Import paths of OptimizerFactory has been changed from d3rlpy.models.OptimizerFactory to d3rlpy.optimizers.OptimizerFactory.

# before
optim = d3rlpy.models.AdamFactory()

# after
optim = d3rlpy.optimizers.AdamFactory()

x2-3 speed up with CudaGraph and torch.compile

In this PR, d3rlpy supports CudaGraph and torch.compile to dramatically speed up training. You can just turn on this new feature by providing compile_graph option:

import d3rlpy

# enable CudaGraph and torch.compile
sac = d3rlpy.algos.SACConfig(compile_graph=True).create(device="cuda:0")

Here is some benchmark result with NVIDIA RTX4070:

	v2.6.2	v2.7.0
Soft Actor-Critic	7.4 msec	3.0 msec
Conservative Q-Learning	12.5 msec	3.8 msec
Decision Transformer	8.9 msec	3.4 msec

Note that this feature can be only enabled if you use CUDA device.

Enhanced optimizer

Learning rate scheduler

This release adds LRSchedulerFactory that provides a learning rate scheduler to individual optimizer.

import d3rlpy

optim = d3rlpy.optimizers.AdamFactory(
    lr_scheduler=d3rlpy.optimizers.CosineAnnealingLRFactory(T_max=1000000)
)

See an example here and docs here.

Gradient clipping

Now, clip_grad_norm option has been added to clip gradients by global norm.

import d3rlpy

optim = d3rlpy.optimizers.AdamFactory(clip_grad_norm=0.1)

SimBa encoder

This release adds SimBa architecture that allows us to scale models effectively. See the paper here.

See docs here.

Enhancement

Gradients are now being tracked by loggers (thanks, @hasan-yaman)

Development

Replace black, isort and pylint with Ruff.
scripts/format has been removed. scripts/lint now formats code styles too.

Contributors

hasan-yaman

Assets 2

12 Oct 11:08

takuseno

v2.6.2

c85895b

Release v2.6.2

This is an emergency update to resolve an issue caused by the new Gymnasium version v1.0.0. Additionally, d3rlpy internally checks versions of both Gym and Gymnasium to make sure that dependencies are correct.

Assets 2

24 Sep 13:50

takuseno

v2.6.1

4b54bdd

Release v2.6.1

Bugfix

There has been an issue in data-parallel distributed training feature of d3rlpy. Each process doesn't correctly synchronize parameters. In this release, this issue has been fixed and the data-parallel distributed training is working properly. Please check the latest example script to see how to use it.

Assets 2

25 Aug 09:58

takuseno

v2.6.0

5ab2b46

Release v2.6.0

New Algorithm

ReBRAC has been added to d3rlpy! Please check a reproduction script here.

Enhancement

DeepMind Control support has been added. You can install dependencies by d3rlpy install dm_control. Please check an example script here.
use_layer_norm option has been added to VectorEncoderFactory.

Bugfix

Fix return-to-go calculation for Decision Transformer.
Fix custom model documentation.

Assets 2

11 May 09:38

takuseno

v2.5.0

bad85ac

Release v2.5.0

New Algorithm

Cal-QL has been added to d3rlpy in v2.5.0! Please check a reproduction script here. To support faithful reproduction, SparseRewardTransitionPicker has been also added, which is used in the reproduction script.

Custom Algorithm Example

One of the frequent questions is "How can I implement a custom algorithm on top of d3rlpy?". Now, the new example script has been added to answer this question. Based on this example, you can build your own algorithm while you can utilize a whole training pipeline provided by d3rlpy. Please check the script here.

Enhancement

Exporting Decision Transformer models as TorchScript and ONNX has been implemented. You can use this feature via save_policy method in the same way as you use with Q-learning algorithms.
Tuple observation support has been added to PyTorch/ONNX export.
Modified return-to-go calculation for Q-learning algorithms and skip this calculation if return-to-go is not necessary.
n_updates option has been added to fit_online method to control update-to-data (UTD) ratio.
write_at_termination option has been added to ReplayBuffer.

Bugfix

Action scaling has been fixed for D4RL datasets.
Default replay buffer creation at fix_online method has been fixed.

Assets 2

18 Feb 03:57

takuseno

v2.4.0

c31ad8a

Release v2.4.0

Tuple observations

In v2.4.0, d3rlpy supports tuple observations.

import numpy as np
import d3rlpy

observations = [np.random.random((1000, 100)), np.random.random((1000, 32))]
actions = np.random.random((1000, 4))
rewards = np.random.random((1000, 1))
terminals = np.random.randint(2, size=(1000, 1))
dataset = d3rlpy.dataset.MDPDataset(
    observations=observations,
    actions=actions,
    rewards=rewards,
    terminals=terminals,
)

You can find an example script here

Enhancements

logging_steps and logging_strategy options have been added to fit and fit_online methods (thanks, @claudius-kienle )
Logging with WanDB has been supported. (thanks, @claudius-kienle )
Goal-conditioned envs in Minari have been supported.

Bugfix

Fix errors for distributed training.
OPE documentation has been fixed.

Contributors

claudius-kienle

Assets 2

02 Dec 08:30

takuseno

v2.3.0

b4290f8

Release v2.3.0

Distributed data parallel training

Distributed data parallel training with multiple nodes and GPUs has been one of the most demanded feature. Now, it's finally available! It's extremely easy to use this feature.

Example:

# train.py
from typing import Dict

import d3rlpy


def main() -> None:
    # GPU version:
    # rank = d3rlpy.distributed.init_process_group("nccl")
    rank = d3rlpy.distributed.init_process_group("gloo")
    print(f"Start running on rank={rank}.")

    # GPU version:
    # device = f"cuda:{rank}"
    device = "cpu:0"

    # setup algorithm
    cql = d3rlpy.algos.CQLConfig(
        actor_learning_rate=1e-3,
        critic_learning_rate=1e-3,
        alpha_learning_rate=1e-3,
    ).create(device=device)

    # prepare dataset
    dataset, env = d3rlpy.datasets.get_pendulum()

    # disable logging on rank != 0 workers
    logger_adapter: d3rlpy.logging.LoggerAdapterFactory
    evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
    if rank == 0:
        evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
        logger_adapter = d3rlpy.logging.FileAdapterFactory()
    else:
        evaluators = {}
        logger_adapter = d3rlpy.logging.NoopAdapterFactory()

    # start training
    cql.fit(
        dataset,
        n_steps=10000,
        n_steps_per_epoch=1000,
        evaluators=evaluators,
        logger_adapter=logger_adapter,
        show_progress=rank == 0,
        enable_ddp=True,
    )

    d3rlpy.distributed.destroy_process_group()


if __name__ == "__main__":
    main()

You need to use torchrun command to start training, which should be already installed once you install PyTorch.

$ torchrun \
   --nnodes=1 \
   --nproc_per_node=3 \
   --rdzv_id=100 \
   --rdzv_backend=c10d \
   --rdzv_endpoint=localhost:29400 \
   train.py

In this case, 3 processes will be launched and start training loop. DecisionTransformer-based algorithms also support this distributed training feature.

The example is also available here

Minari support (thanks, @grahamannett !)

Minari is an OSS library to provide a standard format of offline reinforcement learning datasets. Now, d3rlpy provides an easy access to this library.

You can install Minari via d3rlpy CLI.

$ d3rlpy install minari

Example:

import d3rlpy

dataset, env = d3rlpy.datasets.get_minari("antmaze-umaze-v0")

iql = d3rlpy.algos.IQLConfig(
    actor_learning_rate=3e-4,
    critic_learning_rate=3e-4,
    batch_size=256,
    weight_temp=10.0,
    max_weight=100.0,
    expectile=0.9,
    reward_scaler=d3rlpy.preprocessing.ConstantShiftRewardScaler(shift=-1),
).create(device="cpu:0")

iql.fit(
    dataset,
    n_steps=1000000,
    n_steps_per_epoch=100000,
    evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)

Minimize redundant computes

From this version, calculation of some algorithms are optimized to remove redundant inference. Therefore, especially algorithms with dual optimization such as SAC and CQL became extremely faster than the previous version.

Enhancements

GoalConcatWrapper has been added to support goal-conditioned environments.
return_to_go has been added to Transition and TransitionMiniBatch
MixedReplayBuffer has been added to sample two experiences from multiple buffers with arbitrary ratio.
initial_temperature supports 0 at DiscreteSAC.

Bugfix

Getting started page has been fixed.

Contributors

grahamannett

Assets 2

24 Oct 11:30

takuseno

v2.2.0

f9bde31

Release v2.2.0

Algorithm

DiscreteDecisionTransformer, a Decision Transformer implementation for discrete action-space, has been finally implemented in v2.2.0! The reduction results with Atari 2600 are available here.

import d3rlpy

dataset, env = d3rlpy.datasets.get_cartpole()

dt = d3rlpy.algos.DiscreteDecisionTransformerConfig(
    batch_size=64,
    num_heads=1,
    learning_rate=1e-4,
    max_timestep=1000,
    num_layers=3,
    position_encoding_type=d3rlpy.PositionEncodingType.SIMPLE,
    encoder_factory=d3rlpy.models.VectorEncoderFactory([128], exclude_last_activation=True),
    observation_scaler=d3rlpy.preprocessing.StandardObservationScaler(),
    context_size=20,
    warmup_tokens=100000,
).create()

dt.fit(
    dataset,
    n_steps=100000,
    n_steps_per_epoch=1000,
    eval_env=env,
    eval_target_return=500,
)

Enhancement

Expose action_size and action_space options for manual dataset creation #338
FrameStackTrajectorySlicer has been added.

Refactoring

Typing check of numpy is enabled. Some parts of codes differentiate data types of numpy arrays, which is checked by mypy.

Bugfix

Device error at AWAC #341
Invalid batch.intervals #346
- ⚠️ This fix is important to retain the performance of Q-learning algorithms since v1.1.1.

Assets 2

Releases: takuseno/d3rlpy

Release v2.8.1

Bugfix

Enhancement

Release v2.8.0

New algorithms

Enhancement

Bugfix

Contributors

Release v2.7.0

Breaking changes

Dependency

OptimizerFactory

x2-3 speed up with CudaGraph and torch.compile

Enhanced optimizer

Learning rate scheduler

Gradient clipping

SimBa encoder

Enhancement

Development

Contributors

Release v2.6.2

Release v2.6.1

Bugfix

Release v2.6.0

New Algorithm

Enhancement

Bugfix

Release v2.5.0

New Algorithm

Custom Algorithm Example

Enhancement

Bugfix

Release v2.4.0

Tuple observations

Enhancements

Bugfix

Contributors

Release v2.3.0

Distributed data parallel training

Minari support (thanks, @grahamannett !)

Minimize redundant computes

Enhancements

Bugfix

Contributors

Release v2.2.0

Algorithm

Enhancement

Refactoring

Bugfix