-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #48
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR and your work on this! I'm especially excited for the performance updates, and I'm glad that this package is useful to folks. Just a couple of quick questions on my first read through the proposed changes: New statistical capabilities
Enhanced NaNs
That's great! From the discussion in #47 I think it would still be good to better isolate ESMF to only when it's needed (and only import it if regridding is actually used) - but this is definitely a good fix until then, and probably after as well. Also, you should be able to have PRs run through the tests now. I think the reason they've failed is the additional variable names from the new statistical options. That's probably another thing to discuss - changing the way that variable names are output is going to affect existing workflows, so maybe there's a way to adapt that through the existing structure - maybe keeping the old workflow of running one statistical calculation per Once we've converged on what's best to include, we can work on writing new tests for the new capabilities. That's my first thoughts on this - I'm about to be a bit hard to reach for a few days, but I'll keep thinking about this. |
Thanks for your valid points. We would like to clarify the following: statistical capabilities
SUPPORTED_STATISTICS_METHODS = ["weighted_mean", "weighted_sum","weighted_median", "mean", “sum”, "max", "min", "sum", "median", "count", "std"] We’ve already now added "weighted_sum","weighted_median" in addition to the other statistics. Output structure
Enhanced NaNs
The PR won’t pass the test for now until adjust the test code. So we’ve added a notebook shows the test locally. let us know what you think about that. |
Hi again, We've made some modifications to the code. Now, the default option for calculations is set to weighted_mean, which is in alignment with the current output workflow. Additionally, we've included a test_core and test_export in the notebook. After modifying the attribute names to be consistent with 'weighted_mean', the code successfully passed all tests. |
Hello, we realised that the weighted median could not be calculated for a single time image, so we changed the function accordingly. Regarding the count, it is dividing the sum of the pixel area covered by the polygon by the maximum pixel size to get an accurate count of covered pixels in float. |
Hi @ks905383 , I hope you are doing well. I understand that everyone's schedule can be quite busy, and appreciate your time. If you've had a chance to review our changes to the PR, we're happy to hear your thoughts and feedback. Thanks :) |
Hello, wanted to comment on here about the status of this branch. I'm finding myself in the position of needing to use xagg but "sum" the values of the raster within the shapefile regions (accounting for partial overlap), not calculate a weighted average. Is this feature available, at least in code somewhere? |
Thank you for providing this valuable package; we are currently integrating it into our workflow. We would like to suggest incorporating the following functionalities and improvements to enhance its speed and efficiency, particularly for larger datasets. These enhancements would also enable the package to handle NaNs values effectively and provide support for various statistics.
Overall, these proposed functionalities are aimed at augmenting the package's efficiency, allowing it to handle big datasets with greater speed and accuracy while providing a broader range of statistical capabilities.
Let us know if you would like to merge the proposed functionalities.