Reimplementation of Soft Actor-Critic Algorithms and Applications and a deterministic variant of SAC from Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Added another branch for Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor -> SAC_V
(Note: There is no need for setting Temperature(--alpha
) if --automatic_entropy_tuning
is True.)
python main.py --env-name Humanoid-v2 --aplha 0.05
python main.py --env-name Humanoid-v2 --alpha 0.05 --tau 1 --target_update_interval 1000
python main.py --env-name Humanoid-v2 --policy Deterministic --tau 1 --target_update_interval 1000
Parameters | Value |
---|---|
Shared | - |
optimizer | Adam |
learning rate(--lr ) |
3x10−4 |
discount(--gamma ) (γ) |
0.99 |
replay buffer size(--replay_size ) |
1x106 |
automatic_entropy_tuning(--automatic_entropy_tuning ) |
False |
number of hidden layers (all networks) | 2 |
number of hidden units per layer(--hidden_size ) |
256 |
number of samples per minibatch(--batch_size ) |
256 |
nonlinearity | ReLU |
SAC | - |
target smoothing coefficient(--tau ) (τ) |
0.005 |
target update interval(--target_update_interval ) |
1 |
gradient steps(--updates_per_step ) |
1 |
SAC (Hard Update) | - |
target smoothing coefficient(--tau ) (τ) |
1 |
target update interval(--target_update_interval ) |
1000 |
gradient steps (except humanoids)(--updates_per_step ) |
4 |
gradient steps (humanoids)(--updates_per_step ) |
1 |
Environment (--env-name ) |
Temperature (--alpha ) |
---|---|
HalfCheetah-v2 | 0.2 |
Hopper-v2 | 0.2 |
Walker2d-v2 | 0.2 |
Ant-v2 | 0.2 |
Humanoid-v2 | 0.05 |