-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PR]: Update documentation on temporal averaging, usage of bounds, and generation of weights #601
Conversation
Hey @pochedls, this small PR is ready for review whenever you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added "Warning:" to some comments.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #601 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 15
Lines 1602 1602
=========================================
Hits 1602 1602 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder – thanks for taking this on. I've made some suggestions for you to consider.
xcdat/bounds.py
Outdated
This method general assumes data with time frequencies of annual, | ||
monthly, daily, or sub-daily. It loops over the time axis coordinate | ||
variables and attempts to add bounds for each of them if they don't | ||
exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not always true, since the specified method can be "midpoint." Suggest adding text similar to the text in add_missing_bounds
where the freq
optional argument is documented in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm here...could a space be added between the freq
and daily_subfreq
arguments in the documentation (for consistency)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll copy the text from add_missing_bounds()
here.
Typically numpy
style docstrings have no spaces in between parameters. I think I added spaces after parameters that use bullet points in the description because it wasn't rendering lists correctly. I removed spaces and the docs still seem to render fine.
I'll go through docstrings and remove spaces if they used for parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder – I don't have strong feelings about this. I just noticed it wasn't consistent (but now I understand why).
xcdat/temporal.py
Outdated
Warning: If one time point spans across the time intervals that | ||
you are averaging into, then weights are not properly assigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm debating if this should go here or a Note below. Suggest the following text.
Warning: When using weighted averages, the weights are assigned based on the timepoint value. For example, a time point of 2020-06-15
with bounds (2020-06-01, 2020-06-30)
has 30 days of weight assigned to June, 2020 (e.g., for an annual average calculation). This would be expected behavior, but it's possible that data could span across typical temporal boundaries. For example, a time point of 2020-06-01
with bounds (2020-05-16, 2020-06-15)
would have 30 days of weight, but this weight would be assigned to June, 2020, which would be incorrect (15 days of weight should be assigned to May and 15 days of weight should be assigned to June). This issue could plausibly arise when using pentad data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also make sure new lines are used consistently between docstring parameter entries (here and elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this change and added your suggestion to the Notes section.
xcdat/temporal.py
Outdated
|
||
Warning: If one time point spans across the time intervals that | ||
you are averaging into, then weights are not properly assigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere – should this be a longer entry at the bottom of the docstring (like the example above). Maybe we could also include a brief note, like:
"Note that weights are assigned by the labelled time point. If the dataset includes timepoints that span across typical boundaries (e.g., a timepoint on 2020-06-01
with bounds that begin in May 2020 and end in June 2020), the weights will not be assigned properly. See explanation below."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pochedls I think I addressed your suggestions. Let me know if I'm missing anything.
- Add more specific information on how temporal averaging grouping is performed - Add information on assumed time frequencies for `get_time_bounds()` - Add info about how weights are not properly assigned if one time point spans across the time interval being averaged into
Co-authored-by: Stephen Po-Chedley <[email protected]>
c17b8fd
to
99327ff
Compare
Description
Weights are determined from the difference in the bounds) -- already done in docstringsxcdat/xcdat/temporal.py
Lines 170 to 189 in fbf1db6
get_time_bounds
generally assumes data with frequencies of annual, monthly, daily, or sub-daily (I thought of this while writing this issue)Checklist
If applicable: