Skip to content

Commit

Permalink
Update the instructions in notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
cr-xu committed Feb 2, 2024
1 parent a7321eb commit 92ebb0a
Showing 1 changed file with 16 additions and 9 deletions.
25 changes: 16 additions & 9 deletions tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@
"\n",
"</p>\n",
"\n",
"<img src=\"img/awake.png\" style=\"width:60%; margin:auto;\"/>\n",
"<img src=\"img/awake.png\" style=\"width:50%; margin:auto;\"/>\n",
"\n",
"- **Momentum**: 10-20 MeV/c\n",
"- **Electrons per bunch**: 1.2e9\n",
Expand Down Expand Up @@ -205,16 +205,23 @@
"In this tutorial, we apply the action by adding a delta change $\\Delta a$ to the current magnet strengths .\n",
"\n",
"<h3 style=\"color: #b51f2a\">States/Observations</h3>\n",
"The observations are the readings of ten beam position monitors (BPMs), which read the position of the beam at a particular point in the beamline. The states are also normalized to [-1,1], corresponding to $\\pm$ 100 mm in the real accelerator.\n",
"The observations are the readings of ten beam position monitors (BPMs), which read the position of the beam at a particular point in the beamline. The states are also normalized to [-1,1], corresponding to $\\pm$ 100 mm in the real accelerator.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 style=\"color: #b51f2a\">Formulating the RL problem</h2>\n",
"\n",
"<h3 style=\"color: #b51f2a\">Reward</h3>\n",
"The reward is the negative RMS value of the distance to the target trajectory. \n",
"\n",
"$$\n",
"r(\\bm{O}) = - \\sqrt{ \\frac{1}{10} \\sum_{i=1}^{10} (O_{i} - O^{\\text{target}}_{i})^2},\n",
"r(O) = - \\sqrt{ \\frac{1}{10} \\sum_{i=1}^{10} (O_{i} - O^{\\text{target}}_{i})^2},\n",
"$$\n",
"\n",
"where $\\bm{O}^{\\text{target}}=\\vec{0}$ for a centered orbit.\n"
"where $O^{\\text{target}}=\\vec{0}$ for a centered orbit."
]
},
{
Expand Down Expand Up @@ -350,8 +357,8 @@
"- This will be performed for different evaluation tasks, just to assess how the policy performs in different lattices.\n",
"\n",
"Side note:\n",
"- The benchmark policy will not immediately find the settings for the target trajectory, because the actions are scaled down for safety reasons.\n",
"- We can then compare metrics of both policies.\n"
"- The benchmark policy will not immediately find the settings for the target trajectory, because the actions are scaled down for safety reasons so that the maximum step is within $[-1,1]$ in the normalized space.\n",
"- We can then compare the metrics of both policies.\n"
]
},
{
Expand Down Expand Up @@ -425,7 +432,7 @@
"source": [
"<h3 style=\"color:#038aa1;\">Questions &#128187</h3>\n",
"\n",
"<p style=\"color:#038aa1;\">Go to <code>ppo.py</code> and change the <code>total_timesteps</code> to 100. This can be done by providing the command line argument <code>--steps [num_steps]</code> Run it in the terminal with <code>python ppo.py --steps 100</code></p>\n",
"<p style=\"color:#038aa1;\">Go to <code>ppo.py</code> and change the <code>total_timesteps</code> to 100. This can be done by providing the command line argument <code>--steps [num_steps]</code> Run it in the terminal with <code>python ppo.py --train --steps 100</code></p>\n",
"\n",
"<p style=\"color:#038aa1;\">$\\implies$ Considering the PPO agent settings: will we fill the buffer? what do you expect that happens?</p>\n",
"<p style=\"color:#038aa1;\">$\\implies$ What is the difference in episode length between the benchmark policy and PPO? </p> \n",
Expand Down Expand Up @@ -458,7 +465,7 @@
"source": [
"<h3 style=\"color:#038aa1;\">Questions &#128187</h3>\n",
"\n",
"<p style=\"color:#038aa1;\">Set <code>total_timesteps</code> to 50,000 this time. Run it in the terminal with <code>python ppo.py --steps 50000</code></p>\n",
"<p style=\"color:#038aa1;\">Set <code>total_timesteps</code> to 50,000 this time. Run it in the terminal with <code>python ppo.py --train --steps 50000</code></p>\n",
"\n",
"<p style=\"color:#038aa1;\">$\\implies$ What are the main differences between the untrained and trained PPO policies?</p>"
]
Expand Down Expand Up @@ -795,7 +802,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.10.10"
},
"vscode": {
"interpreter": {
Expand Down

0 comments on commit 92ebb0a

Please sign in to comment.