Catastrohpic accuracy loss in large float32 array for nanmean and nanstd #462

prutschman-iv · 2024-10-15T23:01:13Z

Describe the bug
Starting somewhere between 10 million and 50 million elements, the bn.nanmean and bn.nanstd functions appear to experience a catastrophic loss of accuracy with float32 data.

To Reproduce
This code creates float32 arrays of increasing size, and compares the results of the np and Bottleneck versions of nanmean and nanstd:

import numpy as np
import bottleneck as bn
print(f'{np.__version__=} {bn.__version__=}')
million = 10**6
for size in (million, 10*million,50*million, 100*million):
    rand_data = np.random.random(size=size).astype(np.float32)
    print(f"{size}")
    print("    mean\t", np.nanmean(rand_data), bn.nanmean(rand_data))
    print("     std\t", np.nanstd(rand_data), bn.nanstd(rand_data))

When I run it, I get:

np.__version__='1.24.0' bn.__version__='1.4.1'
1000000
    mean         0.5003439 0.5003493428230286
     std         0.28887847 0.28882330656051636
10000000
    mean         0.49992886 0.49994951486587524
     std         0.28866056 0.28725674748420715
50000000
    mean         0.5000019 0.33554431796073914
     std         0.28868446 0.30973944067955017
100000000
    mean         0.4999724 0.16777215898036957
     std         0.2886786 0.38657501339912415

Versions:

Package           Version
----------------- --------------------
astropy           6.1.4
astropy-iers-data 0.2024.10.14.0.32.55
Bottleneck        1.4.1
numpy             1.24.0
packaging         24.1
pip               24.0
pyerfa            2.0.1.4
PyYAML            6.0.2
setuptools        69.2.0
wheel             0.43.0

Expected behavior
I expected the differences between numpy and Bottleneck to be zero, or at least small relative to the size of the result.

Additional context
I encountered this while trying to track down astropy/astropy#17185 . astropy/astropy#11492 may be related, but there the accuracy loss appeared smaller.

The text was updated successfully, but these errors were encountered:

rdbisme · 2024-10-18T22:20:26Z

This might be related: #164

rdbisme · 2024-10-18T22:21:01Z

Does this solve the problem? #414

prutschman-iv added the bug label Oct 15, 2024

neutrinoceros mentioned this issue Oct 16, 2024

Unexpected results from sigma_clipped_stats for large np.float32 input arrays astropy/astropy#17185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catastrohpic accuracy loss in large float32 array for nanmean and nanstd #462

Catastrohpic accuracy loss in large float32 array for nanmean and nanstd #462

prutschman-iv commented Oct 15, 2024

rdbisme commented Oct 18, 2024

rdbisme commented Oct 18, 2024

Catastrohpic accuracy loss in large float32 array for nanmean and nanstd #462

Catastrohpic accuracy loss in large float32 array for nanmean and nanstd #462

Comments

prutschman-iv commented Oct 15, 2024

rdbisme commented Oct 18, 2024

rdbisme commented Oct 18, 2024