MBRL-Lib

This repo contains a unified opensource code implementation for the Model-Based Reinforcement Learning methods.(@TJU). MBRL-Lib provides implementations of popular MBRL algorithms as examples of how to use this library. You can find them in the MBRL-Lib folder. Currently, we have implemented Dreamer, MBPO, MuZero, we plan to keep increasing this list in the future. We present one of the most comprehensive Model-Based library so far , which covered most mainstream algorithms in Model-Based RL area. This repo will be constantly updated to include new researches made by TJU-RL-Lab. (The development of this repo is in progress at present.)

Introduction

Reinforcement learning algorithms can be divided into two main categories: the first category is Model Free methods, which are able to learn a value function or a strategy function directly by interacting with the environment without modeling the environment; the second category is Model Based methods, which need to model the environment by interacting with it and then use the model to make action planning or strategy selection. The model free algorithm has achieved great results in many experimental scenarios and game environments due to its simple implementation and high asymptotic performance, but the sampling efficiency of the model free method is extremely low, resulting in a large number of interactions with the environment. However, the model free class of methods is extremely inefficient in terms of sampling efficiency and requires a lot of interaction with the environment. In contrast, the model based algorithm models the environment and fits the dynamics model of the environment with high sampling efficiency.

The model model of the environment is actually a representation model that explicitly contains knowledge about the environment or the task, and generally two types of models are included: a transition model or a dynamics model and the reward model. Once this model is modeled, it can be properly integrated into the interaction with the environment and the learning of strategies, as shown in the above figure. Following the guidance by Igor Mordatch (Google) and Jessica Hamrick (DeepMind) in ICML 2020 Tutorial, the core three directions for future work in MBRL are faster planning、higher tolerance to model error、scalability to harder problems.

There are many different ways to classify the mainstream algorithms in modern Model-Based RL area. For example, we can simply divide Model-Based RL into two categories: Learn the Model and Given the Model.

Learn the Model mainly focuses on how to build the environment model.
Given the Model cares about how to utilize the learned model.

From the perspecive of action execution, we can also divide Model-Based RL into two categories: policy learning and planning.

policy learning mainly focuses on trainning a parametric policy network explicitly.
planning cares about how to optimize the action sequence implicitly.

There are many other classification and we can list some of them here. From the perspective of dynamics model, we can divide dynamics models into three categories:forward model、reverse/backward model and inverse model. From the perspective of estimation method, the methods can be categorized asparametric and non-parametric or exact and approximate. From the perspective of planning updating, the methods can be categorized as value update and policy update.

What's more, these perspectives to divide the algorithms are orthogonal, which means some algorithms can be grouped into different categories according to different perspectives.But even with this, we have to admit that it’s really quite hard to draw an accurate, all-encompassing taxonomy of algorithms in the Model-Based RL space, because the modularity of algorithms is not well-represented by a tree structure. So we think it would be more appropriate to give the algorithm to a specific topic rather than a simple classification. Moreover, we will publish a series of related blogs to explain more Model-Based RL algorithms. For a more detailed tutorial of this taxnomy, we refer the reader to our ZhiHu blog series.

An Overall View of Research Works in This Repo

Topic	Method	Is Contained	Is ReadME Prepared	Publication	Link
Analytical Gradient	Dreamer	✅	✅	ICLR 2020	Dream to control: Learning behaviors by latent imagination
Dyna-style	MBPO	✅	✅	NIPS 2019	When to Trust Your Model: Model-Based Policy Optimization
Dyna-style	BMPO	✅	✅	ICML 2020	Bidirectional Model-based Policy Optimization
Dynamics Decomposition	ED2	✅	✅	In progress	ED2: An Environment Dynamics Decomposition Framework for World Model Construction
Planning	PlaNet	✅	✅	ICML 2020	Learning Latent Dynamics for Planning from Pixels
Planning	MuZero	✅	✅	Nature	Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Planning	Sampled MuZero	❌	❌	ICML 2021	Learning and Planning in Complex Action Spaces
Dynamics generalization	CaDM	✅	✅	ICML 2020	Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning
Dynamics generalization	TMCL	❌	❌	NIPS 2020	Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Method

Dreamer
- Vanilla Dreamer
  
  Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination and learns behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model.
- ED2 Dreamer
  
  Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner.
MBPO
- Vanilla MBPO
  
  Model-Based Policy Optimization using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls.
- ED2 MBPO
  
  Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner.
MuZero

Mastering Atari , Go, Chess and Shogi by Planning with a Learned Model. MuZero is a series of work produced by DeepMind.
BMPO

Bidirectional Model-based Policy Optimization. a backward dynamics model to reduce the reliance on accuracy in forward model predictions called Bidirectional Model-based Policy Optimization (BMPO) to utilize both the forward model and backward model to generate short branched rollouts for policy optimization.
PlaNet

Learning Latent Dynamics for Planning from Pixels. Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space.
CaDM

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning.A state-of-the-art model-based RL method for dynamics generalization.

Installation

The algorithms in this repo are all implemented python 3.5 (and versions above). Tensorflow 1.x and PyTorch are the main DL code frameworks we adopt in this repo with different choices in different algorithms.

First of all, we recommend the user to install anaconada and or venv for convenient management of different python envs.

In this repo, the following RL environments may be needed:

OpenAI Gym (e.g., MuJoCo, Robotics)
MinAtar
xxxx

Note that each algorithm may use only one or several environments in the ones listed above. Please refer to the page of specific algorithm for concrete requirements.

To clone this repo:

git clone http://rl.beiyang.ren/tju_rl/MBRL-Lib.git

Note that this repo is a collection of multiple research branches (according to the taxonomy). Environments and code frameworks may differ among different branches. Thus, please follow the installation guidance provided in the specific branch you are insterested in.

Usage

First follow the vanilla Dreamer and MBPO method and run the baselines (we introduced them in Dreamer/README.md and MBPO/README.md). Then run ED2-Dreamer or ED2-MBPO (following the detail operation in ED2-Dreamer/README.md or ED2-MBPO/README.md). In addition, we provide codes for Muzero, one of the most classic baseline for planning method.

If you have useful suggestions, don't hesitate to raise an issue or submit a pull request!

TODO

Update a liscence
Update the README files for each branches
Check the vadality of codes to release

Citation

If you use our repo in your work, we ask that you cite our paper.

Here is an example BibTeX:

@article{wang2021ed2,
  title={ED2: An Environment Dynamics Decomposition Framework for World Model Construction},
  author={Wang, Cong and Yang, Tianpei and Hao, Jianye and Zheng, Yan and Tang, Hongyao and Barez, Fazl and Liu, Jinyi and Peng, Jiajie and Piao, Haiyin and Sun, Zhixiao},
  journal={arXiv preprint arXiv:2112.02817},
  year={2021}
}

Liscense

[To change]

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Acknowledgement

[To add some acknowledgement]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BMPO		BMPO
CaDM		CaDM
PlaNet		PlaNet
README.assets		README.assets
Test @ 579d94e		Test @ 579d94e
.gitmodules		.gitmodules
ICLR22_submission_Decentralized Cross-Entropy Method for Model-Based Reinforcement Learning.pdf		ICLR22_submission_Decentralized Cross-Entropy Method for Model-Based Reinforcement Learning.pdf
ICLR22_submission_DreamerPro_Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations.pdf		ICLR22_submission_DreamerPro_Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations.pdf
ICML21_Temporal Predictive Coding For Model-Based Planning In Latent Space.pdf		ICML21_Temporal Predictive Coding For Model-Based Planning In Latent Space.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MBRL-Lib

Introduction

An Overall View of Research Works in This Repo

Method

Dreamer

MBPO

MuZero

BMPO

PlaNet

CaDM

Installation

Usage

TODO

Citation

Liscense

Acknowledgement

About

Releases

Packages

Languages

davidkillerhahaha/MBRL_Paper_List

Folders and files

Latest commit

History

Repository files navigation

MBRL-Lib

Introduction

An Overall View of Research Works in This Repo

Method

Dreamer

MBPO

MuZero

BMPO

PlaNet

CaDM

Installation

Usage

TODO

Citation

Liscense

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages