This library contains a collection of Reinforcement Learning robotic environments that use the Gymansium API. The environments run with the MuJoCo physics engine and the maintained mujoco python bindings This repository is based on the original Gymnasium-Robotics API and further expands it to host more simulated environments based on the needs of the R3L project.
It is recomended to use a Python environment with Python >= 3.8
, (support for versions < 3.8 has been stopped and newer environments, such us FetchObstaclePickAndPlace
, are not supported in older Python versions).
To install the Gymnasium-Robotics-R3L library to your custom Python environment follow the steps bellow:
- Clone this repository by running the following command on the terminal:
git clone https://github.com/ChristosPeridis/Gymnasium-Robotics-R3L.git
- Navigate to the project's folder:
cd Gymnasium-Robotics-R3L
- Install the Gymnasium-Robotics-R3L API with its dependancies by running:
pip install -e.
Gymnasium-Robotics-R3L
includes the following groups of environments:
- Fetch - A collection of environments with a 7-DoF robot arm that has to perform manipulation tasks such as Reach, Push, Slide, Pick and Place or Obstacle Pick and Place.
- Shadow Dexterous Hand - A collection of environments with a 24-DoF anthropomorphic robotic hand that has to perform object manipulation tasks with a cube, egg-object, or pen. There are variations of these environments that also include data from 92 touch sensors in the observation space.
The D4RL environments are now available. These environments have been refactored and may not have the same action/observation spaces as the original, please read their documentation:
- Maze Environments - An agent has to navigate through a maze to reach certain goal position. Two different agents can be used: a 2-DoF force-controlled ball, or the classic
Ant
agent from the Gymnasium MuJoCo environments. The environment can be initialized with a variety of maze shapes with increasing levels of difficulty. - Adroit Arm - A collection of environments that use the Shadow Dexterous Hand with additional degrees of freedom for the arm movement. The different tasks involve hammering a nail, opening a door, twirling a pen, or picking up and moving a ball.
- Franka Kitchen - Multitask environment in which a 9-DoF Franka robot is placed in a kitchen containing several common household items. The goal of each task is to interact with the items in order to reach a desired goal configuration.
WIP: generate new D4RL
environment datasets with Minari.
The robotic environments use an extension of the core Gymansium API by inheriting from GoalEnv class. The new API forces the environments to have a dictionary observation space that contains 3 keys:
observation
- The actual observation of the environmentdesired_goal
- The goal that the agent has to achievedachieved_goal
- The goal that the agent has currently achieved instead. The objective of the environments is for this value to be close todesired_goal
This API also exposes the function of the reward, as well as the terminated and truncated signals to re-compute their values with different goals. This functionality is useful for algorithms that use Hindsight Experience Replay (HER).
The following example demonstrates how the exposed reward, terminated, and truncated functions can be used to re-compute the values with substituted goals. The info dictionary can be used to store additional information that may be necessary to re-compute the reward, but that is independent of the goal, e.g. state derived from the simulation.
import gymnasium as gym
env = gym.make("FetchReach-v2")
env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
# The following always has to hold:
assert reward == env.compute_reward(obs["achieved_goal"], obs["desired_goal"], info)
assert truncated == env.compute_truncated(obs["achieved_goal"], obs["desired_goal"], info)
assert terminated == env.compute_terminated(obs["achieved_goal"], obs["desired_goal"], info)
# However goals can also be substituted:
substitute_goal = obs["achieved_goal"].copy()
substitute_reward = env.compute_reward(obs["achieved_goal"], substitute_goal, info)
substitute_terminated = env.compute_terminated(obs["achieved_goal"], substitute_goal, info)
substitute_truncated = env.compute_truncated(obs["achieved_goal"], substitute_goal, info)
The GoalEnv
class can also be used for custom environments.
Main Maintainer: Christos Peridis
Maintenance for this project is also contributed by the broader Farama team: farama.org/team.
Note: The Gymnasium-Robotics-R3L API is using the latest Python bindings for MuJoCo, developed and maintened by the Google DeepMind team. This decision has been made to simplify the installation process of the API and make it more cross platform compatible. The original MuJoCo_Py bindings have been tested and proven to be difficult and time-consuming to setup. The configuration can go wrong very easily. Different version of MuJoCo simulator support different versions of MuJoCo_Py bindings and specific versions of MuJoCo_Py bindings require specific builds of Python done by specific gcc compilers. The API supports Python versions >= 3.8. The works of this API have been developed and tested with Python 3.9. We support Linux, Server-Linux and Windows11 through WSL2.