-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: vectorize yield uncertainty calculation #316
Conversation
ec7d227
to
4dc73f6
Compare
4dc73f6
to
8fb5ea7
Compare
8fb5ea7
to
5fc7413
Compare
Codecov Report
@@ Coverage Diff @@
## master #316 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 23 23
Lines 1886 1878 -8
Branches 305 299 -6
=========================================
- Hits 1886 1878 -8
Continue to review full report at Codecov.
|
Here is an example comparing three approaches: a import awkward as ak
import numpy as np
def calc(op):
print(f"{op} has shape {eval(op).shape}:")
print(f"{eval(op)}\n")
# setup: 3 variations, 5 bin yields
v = np.asarray([[1, 2, 3, 4, 5], [3, 4, 5, 6, 7], [5, 6, 7, 8, 9]])
M = np.asarray([[1, 1.4, -0.3], [1.4, 1, 1.2], [-0.3, 1.2, 1]])
calc("v")
calc("v.T")
calc("M")
calc("v.T @ M @ v")
calc("np.diag(v.T @ M @ v)")
# loop-based calculation
res = 0
for i in range(3):
for j in range(3):
res += v[i] * v[j] * M[i][j]
print(f"loop-based calculation: {res}\n")
# variation with awkward and extra depth in v
# elements of v are now split e.g. by channel
v = ak.from_iter([[[1, 2, 3], [4, 5]], [[3, 4, 5], [6, 7]], [[5, 6, 7], [8, 9]]])
R = M[..., np.newaxis, np.newaxis] * v[np.newaxis, ...]
L = v[:, np.newaxis, ...] * R
print(f"awkward version with extra depth: {np.sum(ak.flatten(L, axis=1), axis=0)}") output:
The |
@lhenkelm I updated the comments based on your feedback to try and clarify the calculation. If you get the chance to take a look, I'd be happy to incorporate any additional feedback and otherwise from my side I think this is ok. The code uses |
@alexander-held this looks good and clear to me now :) |
@alexander-held maybe it's worth adding an assert with a message to catch the case where/if the assumption fails? :) |
@agoose77 by "the assumption", do you mean that we only sum over an array with three or less dimensions (after flattening the original 4d array)? I cannot think of a way in which this would break, but it also does not hurt to have that in. I'll add an assert, and we can refactor and remove it in the future. |
Yes :) Although you have a 4D array right now, my thought process is that making this ==4D (before flatten) assumption explicit in an assert is safer than a comment longer term. Ideally you can remove this assert / |
I added the assert, #326 will track the task of changing this after an upstream fix. Thanks to you both for the help with this PR! |
This vectorizes the previously loop-based calculations when determining yield uncertainties. Thanks a lot to @agoose77 for the invaluable help!
For a moderately complex workspace with roughly 20 samples in 50 bins and 200 (non-expanded) parameters, the runtime of the diagonal contributions goes from 0.7 s to below 0.01 s (including setting up an array that is re-used for off-diagonal terms). The off-diagonal contribution goes from 180 s to 0.4 s.
After these changes, the bottleneck in
yield_stdev
is now the initial setup of the yields for all variations (4 s in the above example). There is probably some room to optimize this a bit further as well (example in #315 (comment)). This topic will be tracked in #325.The vectorization of the off-diagonal contribution does not take advantage of two things that were previously used: the symmetry of the correlation matrix, and the fact that
staterror
modifiers are orthogonal in their yield effects on bins (but not on channels, see #323). Given the performance improvement, this may not be crucial anymore, but perhaps there is a way to also take advantage of that in the future.The implementation uses
ak.flatten
to sum over one dimension due to scikit-hep/awkward#1283. The subsequent summation with a three-dimensional array works fine. There is anassert
in the implementation to ensure we are not affected by the upstream bug, and after an upstream fix this can be changed incabinetry
. This is tracked via #326.resolves #315