Reward Function, DRL Target, and Primary Frequency Response #3

TimThacker · 2022-09-03T22:02:16Z

TimThacker
Sep 3, 2022

Hello,

I have been looking through the andes_freq.py file and trying to understand how the target of the training is determined. From my experience RL is generally guided by the reward function, which is often dependent on the error (difference between desired value and value resulting from action taken).

The example provided focuses on secondary frequency control and seeks to use DRL to drive the frequency back to 60Hz. Am I correct in my understanding that this target of 60Hz is set by the following lines:

if not sim_crashed and done: reward -= np.sum(np.abs(3000 * (freq - 1))) else: reward -= np.sum(np.abs(100 * (freq - 1)))

where the reward is determined by the absolute value of some constant multiplied by the difference between the simulated frequency and 1pu (60Hz). Is this correct, or is there somewhere else where the desired result is defined?

I am curious because I am looking to use andes_gym to apply the same algorithm, DDPG, in a different part of the system to improve primary frequency response. I would rather use a short-term action to drive the frequency to the post-event steady state and allow other traditional methods to return the frequency from post-event steady state back to 60Hz. Then if this section is where the target is set, I would simply change the "1" to the corresponding per unit value for post-event steady state frequency.

Furthermore, as I am looking to focus on primary frequency response as opposed to the secondary frequency response, are there any aspects of the environment I should be changing?

Thank you very much. Andes and Andes_Gym have been extremely helpful tools!

cuihantao · 2022-09-05T00:03:50Z

cuihantao
Sep 5, 2022
Maintainer

Hello Tim,

From my experience RL is generally guided by the reward function, which is often dependent on the error (difference between desired value and value resulting from action taken).

That is true.

Am I correct in my understanding that this target of 60Hz is set by the following lines:

Yes. As you explained, the reward is greater when freq is closer to 1.

I am curious because I am looking to use andes_gym to apply the same algorithm, DDPG, in a different part of the system to improve primary frequency response. I would rather use a short-term action to drive the frequency to the post-event steady state and allow other traditional methods to return the frequency from post-event steady state back to 60Hz. Then if this section is where the target is set, I would simply change the "1" to the corresponding per unit value for post-event steady state frequency.

This is probably your research question, but in my understanding, primary frequency response is done through droop-based controls. They act in a smaller time frame compared with the secondary. If you change the 1 to your desired value using the same program, you will still perform a secondary frequency control but with a different target.

I guess the primary frequency control concerns the droop in turbine governors or the Pref in renewables. Maybe you can have some discussions with your advisor.

Thank you for the kind words. I'm glad that you find them useful.

1 reply

TimThacker Sep 5, 2022
Author

Hantao,

I believe I understand the example presented in the paper (secondary freq response) and the approach used. I made appropriate adjustments with the models used in the system, and I am applying the action to alter a variable different from the auxiliary power. This is something that I implemented and tested in a personal fork of the ANDES library previously. I wanted to clarify that I was not missing anything done specifically to focus on secondary frequency response, such as perhaps delaying the action input until after a set time in the simulation.

On another note, I noted the following line of code and accompanying comment:
model.learn(total_timesteps=2000) # we need to change the total steps with action numbers
Could you clarify what you mean by action numbers? It is my understanding that if we want to train the agent over a certain set of episodes, we would determine the value by:
Total Time Steps = Time Steps per Episode * Number of Desired Episodes
For example, to train on a 30s simulation with time step 1/30 for 500 episodes, we would set total_timesteps=450000

Is this correct? Thank you for your help!

cuihantao · 2022-09-05T01:26:50Z

cuihantao
Sep 5, 2022
Maintainer

Abou the numbner of steps, Dr. Yichen Zhang @whoiszyc may have an answer. Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward Function, DRL Target, and Primary Frequency Response #3

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Reward Function, DRL Target, and Primary Frequency Response #3

TimThacker Sep 3, 2022

Replies: 3 comments · 1 reply

cuihantao Sep 5, 2022 Maintainer

TimThacker Sep 5, 2022 Author

cuihantao Sep 5, 2022 Maintainer

TimThacker
Sep 3, 2022

Replies: 3 comments 1 reply

cuihantao
Sep 5, 2022
Maintainer

TimThacker Sep 5, 2022
Author

cuihantao
Sep 5, 2022
Maintainer