A toolkit for working with RDDL domains in Python3. Its main purpose is to wrap a RDDL domain/instance planning problem as an OpenAI Gym environment.
$ pip3 install -U rddlgym
rddlgym
implements the OpenAI Gym API for RDDL problems. It uses pyrddl
to parse RDDL files. Additionally, in order to simulate RDDL domains, it uses rddl2tf
to compile RDDL operations and expressions to TensorFlow 1.X ops.
For further details, please refer to the documentation of the following packages:
NOTE
Please note that rddl2tf
(and consequently rddlgym
) has been mainly developed to support continuous state-action domains. It may not currently work for discrete MDPs.
If you tried to use rddlgym
with your own RDDL files and encounter errors due (probably) to the RDDL-to-TensorFlow compilation, please do not hesitate to open an issue or contact me.
rddlgym
can either be used as a standalone CLI app or it can be integrated with your code in order to implement customized agent-environment interaction loops.
$ rddlgym --help
Usage: rddlgym [OPTIONS] COMMAND [ARGS]...
rddlgym: A toolkit for working with RDDL domains in Python3.
Options:
--help Show this message and exit.
Commands:
info Print metadata for a `rddl` domain/instance.
ls List all RDDL domains and instances available.
parse Check RDDL file parsing.
run Run random policy in `rddl` domain/instance.
show Print `rddl` file.
import rddlgym
# create RDDLGYM environment
rddl_id = "Navigation-v3" # see available RDDL domains/instances with `rddlgym ls` command
env = rddlgym.make(rddl_id, mode=rddlgym.GYM)
# you can also wrap your own RDDL files (domain + instance)
# env = rddlgym.make("/path/to/your/domain_instance.rddl", mode=rddlgym.GYM)
# define random policy
policy = lambda state, t: env.action_space.sample()
# initialize environament
state, t = env.reset()
done = False
# create a trajectory container
trajectory = rddlgym.Trajectory(env)
# sample an episode and store trajectory
while not done:
action = policy(state, t)
next_state, reward, done, info = env.step(action)
trajectory.add_transition(t, state, action, reward, next_state, info, done)
state = next_state
t = env.timestep
print(f"Total Reward = {trajectory.total_reward}")
print(f"Episode length = {len(trajectory)}")
filepath = f"/tmp/rddlgym/{rddl}/data.csv"
df = trajectory.save(filepath) # dump episode data as csv file
print(df) # display dataframe
Copyright (c) 2018-2020 Thiago Pereira Bueno All Rights Reserved.
rddlgym is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
rddlgym is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with rddlgym. If not, see http://www.gnu.org/licenses/.