Update the instructions in notebook

RL4AA · Feb 2, 2024 · 92ebb0a · 92ebb0a
1 parent a7321eb
commit 92ebb0a
Showing 1 changed file with 16 additions and 9 deletions.
diff --git a/tutorial.ipynb b/tutorial.ipynb
@@ -157,7 +157,7 @@
     "\n",
     "</p>\n",
     "\n",
-    "<img src=\"img/awake.png\" style=\"width:60%; margin:auto;\"/>\n",
+    "<img src=\"img/awake.png\" style=\"width:50%; margin:auto;\"/>\n",
     "\n",
     "- **Momentum**: 10-20 MeV/c\n",
     "- **Electrons per bunch**: 1.2e9\n",
@@ -205,16 +205,23 @@
     "In this tutorial, we apply the action by adding a delta change $\\Delta a$ to the current magnet strengths .\n",
     "\n",
     "<h3 style=\"color: #b51f2a\">States/Observations</h3>\n",
-    "The observations are the readings of ten beam position monitors (BPMs), which read the position of the beam at a particular point in the beamline. The states are also normalized to [-1,1], corresponding to $\\pm$ 100 mm in the real accelerator.\n",
+    "The observations are the readings of ten beam position monitors (BPMs), which read the position of the beam at a particular point in the beamline. The states are also normalized to [-1,1], corresponding to $\\pm$ 100 mm in the real accelerator.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<h2 style=\"color: #b51f2a\">Formulating the RL problem</h2>\n",
     "\n",
     "<h3 style=\"color: #b51f2a\">Reward</h3>\n",
     "The reward is the negative RMS value of the distance to the target trajectory. \n",
     "\n",
     "$$\n",
-    "r(\\bm{O}) = - \\sqrt{ \\frac{1}{10} \\sum_{i=1}^{10} (O_{i} - O^{\\text{target}}_{i})^2},\n",
+    "r(O) = - \\sqrt{ \\frac{1}{10} \\sum_{i=1}^{10} (O_{i} - O^{\\text{target}}_{i})^2},\n",
     "$$\n",
     "\n",
-    "where $\\bm{O}^{\\text{target}}=\\vec{0}$ for a centered orbit.\n"
+    "where $O^{\\text{target}}=\\vec{0}$ for a centered orbit."
    ]
   },
   {
@@ -350,8 +357,8 @@
     "- This will be performed for different evaluation tasks, just to assess how the policy performs in different lattices.\n",
     "\n",
     "Side note:\n",
-    "- The benchmark policy will not immediately find the settings for the target trajectory, because the actions are scaled down for safety reasons.\n",
-    "- We can then compare metrics of both policies.\n"
+    "- The benchmark policy will not immediately find the settings for the target trajectory, because the actions are scaled down for safety reasons so that the maximum step is within $[-1,1]$ in the normalized space.\n",
+    "- We can then compare the metrics of both policies.\n"
    ]
   },
   {
@@ -425,7 +432,7 @@
    "source": [
     "<h3 style=\"color:#038aa1;\">Questions &#128187</h3>\n",
     "\n",
-    "<p style=\"color:#038aa1;\">Go to <code>ppo.py</code> and change the <code>total_timesteps</code> to 100. This can be done by providing the command line argument <code>--steps [num_steps]</code> Run it in the terminal with <code>python ppo.py --steps 100</code></p>\n",
+    "<p style=\"color:#038aa1;\">Go to <code>ppo.py</code> and change the <code>total_timesteps</code> to 100. This can be done by providing the command line argument <code>--steps [num_steps]</code> Run it in the terminal with <code>python ppo.py --train --steps 100</code></p>\n",
     "\n",
     "<p style=\"color:#038aa1;\">$\\implies$ Considering the PPO agent settings: will we fill the buffer? what do you expect that happens?</p>\n",
     "<p style=\"color:#038aa1;\">$\\implies$ What is the difference in episode length between the benchmark policy and PPO? </p> \n",
@@ -458,7 +465,7 @@
    "source": [
     "<h3 style=\"color:#038aa1;\">Questions &#128187</h3>\n",
     "\n",
-    "<p style=\"color:#038aa1;\">Set <code>total_timesteps</code> to 50,000 this time. Run it in the terminal with <code>python ppo.py --steps 50000</code></p>\n",
+    "<p style=\"color:#038aa1;\">Set <code>total_timesteps</code> to 50,000 this time. Run it in the terminal with <code>python ppo.py --train --steps 50000</code></p>\n",
     "\n",
     "<p style=\"color:#038aa1;\">$\\implies$ What are the main differences between the untrained and trained PPO policies?</p>"
    ]
@@ -795,7 +802,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.12"
+   "version": "3.10.10"
   },
   "vscode": {
    "interpreter": {