Add RDDLDomain to hub as presented in ICAPS 2024 tutorial

airbus · Oct 21, 2024 · dfe6134 · dfe6134
1 parent 88cce97
commit dfe6134
Show file tree

Hide file tree

Showing 10 changed files with 836 additions and 135 deletions.
diff --git a/binder/environment.yml b/binder/environment.yml
@@ -24,3 +24,4 @@ dependencies:
     - PyOpenGL
     - xarray
     - gymnasium[classic-control]==0.28.1
+    - rddlrepository
diff --git a/notebooks/16_rddl_tuto.ipynb b/notebooks/16_rddl_tuto.ipynb
@@ -0,0 +1,368 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "# Using RDDL domains wihtin scikit-decide"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we demonstrate how to use the RDLL scikit-decide wrapper domain in order to solve it by scikit-decide solvers (including third party ones). This domain is built upon the  RDDL environment from the excellent pyrddlgym-project GitHub project."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Concerning the python kernel to use for this notebook:\n",
+    "- If running locally, be sure to use an environment with scikit-decide[all] and rddlrepository (RDDL benchmarks),\n",
+    "- If running on colab, the next cell does it for you.\n",
+    "- If running on binder, the environment should be ready."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# On Colab: install the library\n",
+    "on_colab = \"google.colab\" in str(get_ipython())\n",
+    "if on_colab:\n",
+    "    import glob\n",
+    "    import json\n",
+    "    import sys\n",
+    "\n",
+    "    using_nightly_version = True\n",
+    "\n",
+    "    if using_nightly_version:\n",
+    "        # look for nightly build download url\n",
+    "        release_curl_res = !curl -L   -H \"Accept: application/vnd.github+json\" -H \"X-GitHub-Api-Version: 2022-11-28\" https://api.github.com/repos/airbus/scikit-decide/releases/tags/nightly\n",
+    "        release_dict = json.loads(release_curl_res.s)\n",
+    "        release_download_url = sorted(\n",
+    "            release_dict[\"assets\"], key=lambda d: d[\"updated_at\"]\n",
+    "        )[-1][\"browser_download_url\"]\n",
+    "        print(release_download_url)\n",
+    "\n",
+    "        # download and unzip\n",
+    "        !wget --output-document=release.zip {release_download_url}\n",
+    "        !unzip -o release.zip\n",
+    "\n",
+    "        # get proper wheel name according to python version used\n",
+    "        wheel_pythonversion_tag = f\"cp{sys.version_info.major}{sys.version_info.minor}\"\n",
+    "        wheel_path = glob.glob(\n",
+    "            f\"dist/scikit_decide*{wheel_pythonversion_tag}*manylinux*.whl\"\n",
+    "        )[0]\n",
+    "\n",
+    "        skdecide_pip_spec = f\"{wheel_path}[all]\"\n",
+    "    else:\n",
+    "        skdecide_pip_spec = \"scikit-decide[all]\"\n",
+    "\n",
+    "    # uninstall google protobuf conflicting with ray and sb3\n",
+    "    ! pip uninstall -y protobuf\n",
+    "\n",
+    "    # install scikit-decide with all extras\n",
+    "    !pip install {skdecide_pip_spec}\n",
+    "\n",
+    "    # install rddl repository\n",
+    "    !pip install rddlrepository"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import shutil\n",
+    "\n",
+    "from ray.rllib.algorithms.ppo import PPO as RLLIB_PPO\n",
+    "from rddlrepository.archive.competitions.IPPC2023.MountainCar.MountainCarViz import (\n",
+    "    MountainCarVisualizer,\n",
+    ")\n",
+    "from rddlrepository.core.manager import RDDLRepoManager\n",
+    "from stable_baselines3 import PPO as SB3_PPO\n",
+    "\n",
+    "from skdecide.hub.domain.rddl import RDDLDomain, RDDLDomainSimplifiedSpaces\n",
+    "from skdecide.hub.solver.cgp import CGP\n",
+    "from skdecide.hub.solver.ray_rllib import RayRLlib\n",
+    "from skdecide.hub.solver.stable_baselines import StableBaseline\n",
+    "from skdecide.utils import rollout"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Domain creation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To create our scikit-decide RDDL-bridge domain, we must first search for a RDDL domain and instance.\n",
+    "The pyrddlgym-project provides the [rddlrepository](https://github.com/pyrddlgym-project/rddlrepository) library of RDDL benchmarks from past IPPC competitions and third-party contributors. We list below the available problems with our pip installation of the library."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "manager = RDDLRepoManager(rebuild=True)\n",
+    "print(sorted(manager.list_problems()))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's create a scikit-decide `RDDLDomain` instance embedding the `MountainCar_ippc2023` benchmark. We render it using scikit-decide.\n",
+    "Note that here we use some options to display within the notebook and store movies when rolling-out:\n",
+    "- `display_with_pygame`: True by default (as in pyRDDLGym), here set to False to avoid a pygame window to pop up\n",
+    "- `display_within_jupyter`: useful to display within a jupyter notebook\n",
+    "- `visualizer`: we use a visualizer dedicated to the chosen benchmark\n",
+    "- `movie_name`: if set, a movie will be created at the end of a rollout "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "problem_info = manager.get_problem(\"MountainCar_ippc2023\")\n",
+    "\n",
+    "\n",
+    "domain_factory = lambda alg_name=None: RDDLDomain(\n",
+    "    rddl_domain=problem_info.get_domain(),\n",
+    "    rddl_instance=problem_info.get_instance(1),\n",
+    "    display_with_pygame=False,\n",
+    "    display_within_jupyter=True,\n",
+    "    visualizer=MountainCarVisualizer,\n",
+    "    movie_name=\"MountainCar_ippc2023-\" + alg_name if alg_name is not None else None,\n",
+    ")\n",
+    "domain = domain_factory()\n",
+    "domain.reset()\n",
+    "img = domain.render()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Solving the domain with scikit-decide (potentially bridged) solvers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now comes the fun part: solving the domain with scikit-decide solvers, some of them - especially the reinforcement learning ones - being bridged to state-of-the-art existing libraries (e.g. RLlib, SB3). You will see that once the domain is defined, solving it takes very few lines of code."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Solving MountainCar_ippc2023 with RLlib's PPO algorithm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The code below creates a scikit-decide's `RayRLlib` solver, then it calls the `solver.solve()` method, and it finally rollout the optimized policy by using scikit-decide's `rollout` utility function. The latter function will render the solution and the domain will generate a movie in the `rddl_movies` folder when reaching the termination condition of the rollout episode."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "solver_factory = lambda: RayRLlib(\n",
+    "    domain_factory=domain_factory, algo_class=RLLIB_PPO, train_iterations=10\n",
+    ")\n",
+    "\n",
+    "with solver_factory() as solver:\n",
+    "    solver.solve()\n",
+    "    rollout(\n",
+    "        domain_factory(alg_name=\"RLLIB-PPO\"),\n",
+    "        solver,\n",
+    "        max_steps=300,\n",
+    "        render=True,\n",
+    "        verbose=False,\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here is an example of executing the RLlib's PPO policy trained for 100 iterations on the mountain car benchmark:\n",
+    "\n",
+    "![RLLIB PPO example solution](rddl_images/MountainCar_ippc2023-RLLIB-PPO_example.gif)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Solving MountainCar_ippc2023 with StableBaselines-3's PPO"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once the domain is defined, very few lines of code are sufficient to test another solver whose capabilities are compatible with the domain. In the cell below, we now test Stablebaselines-3's PPO algorithm."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "solver_factory = lambda: StableBaseline(\n",
+    "    domain_factory=domain_factory,\n",
+    "    algo_class=SB3_PPO,\n",
+    "    baselines_policy=\"MultiInputPolicy\",\n",
+    "    learn_config={\"total_timesteps\": 10000},\n",
+    "    verbose=0,\n",
+    ")\n",
+    "\n",
+    "with solver_factory() as solver:\n",
+    "    solver.solve()\n",
+    "    rollout(\n",
+    "        domain_factory(alg_name=\"SB3-PPO\"),\n",
+    "        solver,\n",
+    "        max_steps=1000,\n",
+    "        render=True,\n",
+    "        verbose=False,\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Solving MountainCar_ippc2023 with CGP"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Scikit-decide provides an implementation of [Cartesian Genetic Programming](https://dl.acm.org/doi/10.1145/3205455.3205578) (CGP), a form of Genetic Programming which optimizes a function (e.g. control policy) by learning its best representation as a directed acyclic graph of mathematical operators. One of the great capabilities of scikit-decide is to provide simple high-level means to compare algorithms from different communities (RL, GP, search, planning, etc.) on the same domains with few lines of code.\n",
+    "\n",
+    "<img src=\"rddl_images/cgp-sketch.png\" alt=\"Cartesian Genetic Programming\" width=\"700\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since our current implementation of CGP in scikit-decide does not handle complex observation spaces such as the dictionary spaces returned by the RDDL simulator, we used instead `RDDLDomainSimplifiedSpaces` where all actions and observations are numpy arrays thanks to the powerful `flatten` and `flatten_space` methods of `gymnasium`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We call the CGP solver on this simplified domain and we render the obtained solution after a few iterations (including the generation of the video in the `rddl_movies` folder)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "domain_factory = lambda alg_name=None: RDDLDomainSimplifiedSpaces(\n",
+    "    rddl_domain=problem_info.get_domain(),\n",
+    "    rddl_instance=problem_info.get_instance(1),\n",
+    "    display_with_pygame=False,\n",
+    "    display_within_jupyter=True,\n",
+    "    visualizer=MountainCarVisualizer,\n",
+    "    movie_name=\"MountainCar_ippc2023-\" + alg_name if alg_name is not None else None,\n",
+    "    max_frames=200,\n",
+    ")\n",
+    "\n",
+    "domain = domain_factory()\n",
+    "\n",
+    "if os.path.exists(\"TEMP_CGP\"):\n",
+    "    shutil.rmtree(\"TEMP_CGP\")\n",
+    "\n",
+    "solver_factory = lambda: CGP(\n",
+    "    domain_factory=domain_factory, folder_name=\"TEMP_CGP\", n_it=25, verbose=False\n",
+    ")\n",
+    "with solver_factory() as solver:\n",
+    "    solver.solve()\n",
+    "    rollout(domain_factory(\"CGP\"), solver, max_steps=200, render=True, verbose=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here is an example of executing the CGP policy on the mountain car benchmark:\n",
+    "\n",
+    "![CGP example solution](rddl_images/MountainCar_ippc2023-CGP_example.gif)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/notebooks/rddl_images/MountainCar_ippc2023-CGP_example.gif b/notebooks/rddl_images/MountainCar_ippc2023-CGP_example.gif
diff --git a/notebooks/rddl_images/MountainCar_ippc2023-RLLIB-PPO_example.gif b/notebooks/rddl_images/MountainCar_ippc2023-RLLIB-PPO_example.gif
diff --git a/notebooks/rddl_images/cgp-sketch.png b/notebooks/rddl_images/cgp-sketch.png