Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset moving average #1984

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

tonynajjar
Copy link
Contributor

@tonynajjar tonynajjar commented Jan 7, 2025

Angsa Deployment Team and others added 12 commits October 22, 2024 12:02
Signed-off-by: Angsa Deployment Team <[email protected]>
Signed-off-by: Angsa Deployment Team <[email protected]>
Signed-off-by: Tony Najjar <[email protected]>
Signed-off-by: Tony Najjar <[email protected]>
Signed-off-by: Tony Najjar <[email protected]>
Signed-off-by: Tony Najjar <[email protected]>
@@ -2322,6 +2322,10 @@ std::vector<std::string> ControllerManager::get_controller_names()

void ControllerManager::read(const rclcpp::Time & time, const rclcpp::Duration & period)
{
if (periodicity_stats_.GetCount() >= 100)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the window size could be made configurable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Could you please add a param for that? I think 100 is a good default value for resetting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to leave it to the user to define it. If not defined, let it accumulate from the beginning.

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.25%. Comparing base (6e1059c) to head (b3ac890).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
controller_manager/src/controller_manager.cpp 76.92% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1984      +/-   ##
==========================================
- Coverage   89.26%   89.25%   -0.01%     
==========================================
  Files         130      130              
  Lines       14493    14505      +12     
  Branches     1257     1257              
==========================================
+ Hits        12937    12947      +10     
- Misses       1088     1090       +2     
  Partials      468      468              
Flag Coverage Δ
unittests 89.25% <76.92%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
controller_manager/src/controller_manager.cpp 76.20% <76.92%> (+<0.01%) ⬆️

... and 1 file with indirect coverage changes

bmagyar
bmagyar previously approved these changes Jan 7, 2025
Copy link
Member

@bmagyar bmagyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is pretty useful for diagnosing edge cases

Copy link
Member

@saikishor saikishor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to know why your read cycle happened in 0.00001615 sec (61919,5044 Hz)? This doesn't make sense at all in my opinion. Can you share some information regarding your setup. That's the reason that it is reporting the issue.

image

For the first sample to be a valid one, this change was introduced. We have tests running in the CI with around 10-20% margin as it doesn't have a RT patch

rclcpp::Time previous_time = cm->now() - period;

@saikishor
Copy link
Member

I would suggest to take a look at your setup and understand why your min and max periodicity rates are bad. In my opinion, window option is just to avoid the ERROR in diagnostics but not doesn't solve the underlying issue. Moreover, with the current approach, the user is always aware of the worst case scenario after the testing (through min and max values) and doesn't need to rush to take a look at those peaks

@tonynajjar
Copy link
Contributor Author

In my opinion, window option is just to avoid the ERROR in diagnostics but not doesn't solve the underlying issue.

I fully agree, I just did not want to go through the pain of debugging why only one of the cycles is executed super quickly. I printed out the measurements and it's really just one that is "problematic". It really doesn't affect my application at all and it will be hard to debug. I suspect it's something to do with a CPU/Load spike of my system at startup.

@saikishor
Copy link
Member

I fully agree, I just did not want to go through the pain of debugging why only one of the cycles is executed super quickly. I printed out the measurements and it's really just one that is "problematic". It really doesn't affect my application at all and it will be hard to debug. I suspect it's something to do with a CPU/Load spike of my system at startup.

Then we will have to check with #1918, after this is merged. My idea is to add this info into it, so you can diagnose it at run time.

Regarding the CPU/Load spike of my system at startup, it could cause the minimum periodicity, but the max might be coming from the first iteration

Signed-off-by: Tony Najjar <[email protected]>
@christophfroehlich
Copy link
Contributor

christophfroehlich commented Jan 9, 2025

I never used the MovingAverageStatistics, but how I understand the docstring

Note: reset() must be called manually in order to start a new measurement window.

and

Reset all calculated values. Equivalent to a new window for a moving average.

is, that if after window_size update steps there is basically no averaging but just the current value and then the number of samples for averaging increases with every update until the next window_size update steps is reached. This means that we don't have a constant horizon for the moving average but a "sawtooth" pattern for the horizon? Is this the desired behavior?
It just doesn't matter if it is way smaller than the diagnostics update period (which is 1s?)

Comment on lines +32 to +36
window_size: {
type: int,
default_value: -1,
description: "The number of samples to consider for the mean. The diagnostics will be computed over the last `window_size` samples. -1 means that all samples will be considered.",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
window_size: {
type: int,
default_value: -1,
description: "The number of samples to consider for the mean. The diagnostics will be computed over the last `window_size` samples. -1 means that all samples will be considered.",
}
window_size: {
type: double,
default_value: .inf,
description: "The window size in seconds to consider for the mean. The diagnostics will be computed over the last `window_size` seconds. Default value infinity means that all data points will be considered.",
validation: {
gt<>: 0.0,
}
}

I believe the above validation works if you leave it with default value of infinity. I personally think it is more intuitive to the user to talk in seconds window rather than in the samples. What do you guys think?

Comment on lines 71 to 78
controllers:
periodicity:
window_size: {
type: int,
default_value: -1,
description: "The number of samples to consider for the mean. The diagnostics will be computed over the last `window_size` samples. -1 means that all samples will be considered.",
}
mean_error:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controllers:
  window_size:
  periodicity:
    mean_error:

For the controllers, I believe this window_size should be a level higher, as it is being used for both periodicity and execution_time

@@ -3347,6 +3365,7 @@ void ControllerManager::controller_manager_diagnostic_callback(
const std::string periodicity_stat_name = "periodicity";
const auto cm_stats = periodicity_stats_.GetStatistics();
stat.add("update_rate", std::to_string(get_update_rate()));
stat.add(periodicity_stat_name + ".sample_count", std::to_string(cm_stats.sample_count));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is an intuitive diagnostic for the user, moreover it tend to be a constant rather than a varying value

Comment on lines +2519 to +2524
if (
loaded_controller.periodicity_statistics->GetCount() >=
params_->diagnostics.threshold.controllers.periodicity.window_size)
{
loaded_controller.periodicity_statistics->Reset();
}
Copy link
Member

@saikishor saikishor Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this logic will not work for default value -1 as the count will always be greater than window_size. Please check my recommendation to use the convention of seconds

Suggested change
if (
loaded_controller.periodicity_statistics->GetCount() >=
params_->diagnostics.threshold.controllers.periodicity.window_size)
{
loaded_controller.periodicity_statistics->Reset();
}
if ((std::finite(params_->diagnostics.threshold.controllers.window_size) &&
loaded_controller.periodicity_statistics->GetCount() >=
static_cast<unsigned int>(params_->diagnostics.threshold.controllers.window_size * update_rate_)))
{
loaded_controller.periodicity_statistics->Reset();
}

Comment on lines +2508 to +2513
if (
loaded_controller.execution_time_statistics->GetCount() >=
params_->diagnostics.threshold.controllers.periodicity.window_size)
{
loaded_controller.execution_time_statistics->Reset();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this logic doesn't work for the default option. Check the comments below

Comment on lines +2325 to +2330
if (
periodicity_stats_.GetCount() >=
params_->diagnostics.threshold.controller_manager.periodicity.window_size)
{
periodicity_stats_.Reset();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this logic doesn't work for the default option
See the below comments for suggestions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants