-
Notifications
You must be signed in to change notification settings - Fork 9
/
README.txt
33 lines (26 loc) · 1.47 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
- This is the code for the paper: On-Policy vs. Off-Policy Deep Reinforcement Learning for Resource Allocation in Open Radio Access Network
- It has been tested on Windows 10 and Python 3.8.3
- To run the code, mainly you need:
* pip install torch
* pip install gurobipy
- To avoid the long training time, you could go directly to the tests folder and run:
* reward_plot.py to get the step and episode reward figures (first two figures)
* stability_plot.py to get the ACER and PPO reward figures for different NN architectures
* plot.py to get the energy and energy per latency figures (last two figures)
- To start training from scratch, you need to generate the reward files and the trained models weights by running:
* acer_32.py, then
* acer_64.py, then
* acer_256.py, then
* ppo_32.py, then
* ppo_64.py, then
* ppo_256.py, then
* reward_plot.py and stability_plot.py
As a result you will create folders (PPO_files, PPO_pretrained, ACER_files, ACER_pretrained)
that contains reward files and trained models respectively.
- learn_acer.py and learn_ppo.py load the trained models to test them in energy and latency performance
* run main.py to to do these tests and plot the energy and energy per latency figures.
- opt.py and greedy.py implement the optimal MIP solution and the greedy solution respectively.
----- REFERENCES -----
* https://github.com/higgsfield/RL-Adventure-2
* https://github.com/nikhilbarhate99/PPO-PyTorch
* https://github.com/gohsyi/cluster_optimization