Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Keep the intermediate values in float64 #414

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rdbisme
Copy link
Collaborator

@rdbisme rdbisme commented Jul 2, 2022

No description provided.

@rdbisme
Copy link
Collaborator Author

rdbisme commented Jul 2, 2022

This has a some impact on performance, I'd like to have the opinion on if we should merge it or not. @stroxler?

       before           after         ratio
     [4d6b551c]       [54cea238]
     <v1.3.5^0>       <float64-intermediate>
            22.2M            22.4M     1.01  memory.Memory.peakmem_nanmedian
+     4.16±0.02μs      4.84±0.01μs     1.16  move.Time1DMove.time_move_max('float64', (1000,), 10)
+        2.12±0μs      2.83±0.03μs     1.33  move.Time1DMove.time_move_mean('float32', (1000,), 10)
+       160±0.2μs          232±2μs     1.45  move.Time1DMove.time_move_mean('float32', (100000,), 10)
+     19.6±0.01ms      25.4±0.09ms     1.30  move.Time1DMove.time_move_mean('float32', (10000000,), 10)
-     4.85±0.01μs      4.38±0.01μs     0.90  move.Time1DMove.time_move_min('float32', (1000,), 10)
+     4.42±0.01μs      5.42±0.02μs     1.23  move.Time1DMove.time_move_std('float32', (1000,), 10)
+       381±0.4μs        491±0.6μs     1.29  move.Time1DMove.time_move_std('float32', (100000,), 10)
+     41.7±0.03ms      52.7±0.03ms     1.26  move.Time1DMove.time_move_std('float32', (10000000,), 10)
+        1.88±0μs      2.72±0.02μs     1.45  move.Time1DMove.time_move_sum('float32', (1000,), 10)
+       134±0.6μs          216±2μs     1.61  move.Time1DMove.time_move_sum('float32', (100000,), 10)
+     17.0±0.01ms      23.8±0.01ms     1.40  move.Time1DMove.time_move_sum('float32', (10000000,), 10)
+     3.62±0.01μs      4.26±0.03μs     1.18  move.Time1DMove.time_move_var('float32', (1000,), 10)
+       306±0.3μs        378±0.3μs     1.23  move.Time1DMove.time_move_var('float32', (100000,), 10)
+     34.2±0.02ms      41.4±0.02ms     1.21  move.Time1DMove.time_move_var('float32', (10000000,), 10)
+     8.21±0.02ms       9.74±0.3ms     1.19  move.Time2DMove.time_move_argmax('int64', (1000, 1000), 'C', 0, 10)
-      9.22±0.4ms       7.74±0.3ms     0.84  move.Time2DMove.time_move_argmax('int64', (1000, 1000), 'F', 1, 10)
+     11.4±0.03ms       12.9±0.3ms     1.13  move.Time2DMove.time_move_argmin('float64', (1000, 1000), 'C', 0, 10)
+     8.33±0.02ms       9.79±0.3ms     1.18  move.Time2DMove.time_move_argmin('int64', (1000, 1000), 'C', 0, 10)
+     10.8±0.04ms       12.5±0.3ms     1.16  move.Time2DMove.time_move_max('float64', (1000, 1000), 'C', 0, 10)
+      10.6±0.1ms       11.9±0.5ms     1.13  move.Time2DMove.time_move_max('float64', (1000, 1000), 'F', 1, 10)
+     7.87±0.01ms       9.44±0.3ms     1.20  move.Time2DMove.time_move_max('int64', (1000, 1000), 'C', 0, 10)
+        1.60±0ms         2.18±0ms     1.36  move.Time2DMove.time_move_mean('float32', (1000, 1000), 'C', 1, 10)
+     4.05±0.04ms      6.68±0.05ms     1.65  move.Time2DMove.time_move_mean('float64', (1000, 1000), 'C', 0, 10)
+     3.87±0.01ms      6.49±0.06ms     1.68  move.Time2DMove.time_move_mean('int64', (1000, 1000), 'C', 0, 10)
+     11.1±0.04ms       12.4±0.3ms     1.12  move.Time2DMove.time_move_min('float64', (1000, 1000), 'C', 0, 10)
+     8.00±0.01ms       9.55±0.3ms     1.19  move.Time2DMove.time_move_min('int64', (1000, 1000), 'C', 0, 10)
-      8.08±0.4ms      7.26±0.03ms     0.90  move.Time2DMove.time_move_min('int64', (1000, 1000), 'F', 1, 10)
+        3.84±0ms         4.93±0ms     1.29  move.Time2DMove.time_move_std('float32', (1000, 1000), 'C', 1, 10)
+     7.41±0.03ms       9.04±0.1ms     1.22  move.Time2DMove.time_move_std('float64', (1000, 1000), 'C', 0, 10)
+      7.43±0.2ms       9.01±0.2ms     1.21  move.Time2DMove.time_move_std('int64', (1000, 1000), 'C', 0, 10)
+        1.35±0ms         2.02±0ms     1.50  move.Time2DMove.time_move_sum('float32', (1000, 1000), 'C', 1, 10)
+     3.94±0.02ms      6.57±0.06ms     1.67  move.Time2DMove.time_move_sum('float64', (1000, 1000), 'C', 0, 10)
+     3.79±0.01ms      6.54±0.04ms     1.73  move.Time2DMove.time_move_sum('int64', (1000, 1000), 'C', 0, 10)
-      3.64±0.3ms      2.88±0.03ms     0.79  move.Time2DMove.time_move_sum('int64', (1000, 1000), 'F', 1, 10)
+        3.10±0ms         3.82±0ms     1.23  move.Time2DMove.time_move_var('float32', (1000, 1000), 'C', 1, 10)
+     7.24±0.05ms       8.00±0.2ms     1.10  move.Time2DMove.time_move_var('float32', (1000, 1000), 'F', 1, 10)
+     6.22±0.07ms       8.17±0.1ms     1.31  move.Time2DMove.time_move_var('float64', (1000, 1000), 'C', 0, 10)
+     5.91±0.05ms      7.95±0.08ms     1.34  move.Time2DMove.time_move_var('int64', (1000, 1000), 'C', 0, 10)
+      5.47±0.1ms      6.21±0.06ms     1.14  move.Time2DMove.time_move_var('int64', (1000, 1000), 'F', 1, 10)
-         757±5ns          634±3ns     0.84  nonreduce_axis.Time1DNonreduceAxis.time_push('int32', (1000,))
-        881±10ns          742±4ns     0.84  nonreduce_axis.Time1DNonreduceAxis.time_push('int64', (1000,))
+     1.35±0.01ms      1.54±0.07ms     1.14  reduce.Time2DReductions.time_nanargmax('int64', (1000, 1000), 'C', 0)
-         391±3ns          354±2ns     0.91  reduce.TimeAllNan2D.time_allnan('float32', (1000, 1000), 'C', None, 'fast')
-         398±2ns          351±3ns     0.88  reduce.TimeAllNan2D.time_allnan('float64', (1000, 1000), 'C', None, 'fast')
-         598±4ns          524±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 0, 'fast')
-         595±4ns          524±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 0, 'slow')
-         598±2ns          525±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 1, 'fast')
-         597±4ns          522±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'C', 1, 'slow')
-         599±2ns          527±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 0, 'fast')
-         597±2ns          524±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 0, 'slow')
-         598±2ns          526±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 1, 'fast')
-         600±4ns          525±2ns     0.87  reduce.TimeAllNan2D.time_allnan('int32', (1000, 1000), 'F', 1, 'slow')
-         599±3ns          524±4ns     0.87  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 0, 'fast')
-         596±2ns          525±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 0, 'slow')
-         600±1ns          524±3ns     0.87  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 1, 'fast')
-         599±5ns          524±2ns     0.87  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'C', 1, 'slow')
-         601±4ns          526±2ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 0, 'fast')
-         595±4ns          525±3ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 0, 'slow')
-         598±2ns          525±4ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 1, 'fast')
-         597±6ns          524±5ns     0.88  reduce.TimeAllNan2D.time_allnan('int64', (1000, 1000), 'F', 1, 'slow')
-         388±1ns        349±0.7ns     0.90  reduce.TimeAnyNan2D.time_anynan('float32', (1000, 1000), 'C', None, 'fast')

@jkadbear
Copy link

jkadbear commented Jul 6, 2022

This commit has a little overhead for float32 but does affect float64/int64/int32 calculations. However, it seems the performance also varies for float64/int64/int32 from the results?

@jkadbear
Copy link

jkadbear commented Jul 6, 2022

Besides, this commit only changes float32 version of move_sum, move_mean, move_std and move_var. These operations contain intermediate moving window summations which are not accurate enough under float32. Other functions like move_max need not to be modified because it has no summation.

The performace test uses sequence length of 10000000. The following example compares the current move_std accuracy for such long sequence:

import numpy as np
import bottleneck as bn
x = np.random.random(10000000)
s64 = bn.move_std(x, 10)
s32 = bn.move_std(x.astype('float32'), 10)

print(s64)
print(s32)
print(np.nan_to_num(s64-s32).max())

The output on my machine is

[       nan        nan        nan ... 0.33617391 0.31365248 0.31293675]
[       nan        nan        nan ... 0.3361321  0.31345436 0.31280586]
0.003174458068280983

The max difference between float64 and float32 is about 3e-3 when data is ideal (sampled from a uniform distribution). I do not think it is a reliable result because the real world time series data would be stranger (leading to larger float errors).

@rdbisme
Copy link
Collaborator Author

rdbisme commented Nov 11, 2022

@qwhelan, maybe you have time to provide an opinion on this? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants