Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catastrohpic accuracy loss in large float32 array for nanmean and nanstd #462

Open
prutschman-iv opened this issue Oct 15, 2024 · 2 comments
Labels

Comments

@prutschman-iv
Copy link

Describe the bug
Starting somewhere between 10 million and 50 million elements, the bn.nanmean and bn.nanstd functions appear to experience a catastrophic loss of accuracy with float32 data.

To Reproduce
This code creates float32 arrays of increasing size, and compares the results of the np and Bottleneck versions of nanmean and nanstd:

import numpy as np
import bottleneck as bn
print(f'{np.__version__=} {bn.__version__=}')
million = 10**6
for size in (million, 10*million,50*million, 100*million):
    rand_data = np.random.random(size=size).astype(np.float32)
    print(f"{size}")
    print("    mean\t", np.nanmean(rand_data), bn.nanmean(rand_data))
    print("     std\t", np.nanstd(rand_data), bn.nanstd(rand_data))

When I run it, I get:

np.__version__='1.24.0' bn.__version__='1.4.1'
1000000
    mean         0.5003439 0.5003493428230286
     std         0.28887847 0.28882330656051636
10000000
    mean         0.49992886 0.49994951486587524
     std         0.28866056 0.28725674748420715
50000000
    mean         0.5000019 0.33554431796073914
     std         0.28868446 0.30973944067955017
100000000
    mean         0.4999724 0.16777215898036957
     std         0.2886786 0.38657501339912415

Versions:

Package           Version
----------------- --------------------
astropy           6.1.4
astropy-iers-data 0.2024.10.14.0.32.55
Bottleneck        1.4.1
numpy             1.24.0
packaging         24.1
pip               24.0
pyerfa            2.0.1.4
PyYAML            6.0.2
setuptools        69.2.0
wheel             0.43.0

Expected behavior
I expected the differences between numpy and Bottleneck to be zero, or at least small relative to the size of the result.

Additional context
I encountered this while trying to track down astropy/astropy#17185 . astropy/astropy#11492 may be related, but there the accuracy loss appeared smaller.

@rdbisme
Copy link
Collaborator

rdbisme commented Oct 18, 2024

This might be related: #164

@rdbisme
Copy link
Collaborator

rdbisme commented Oct 18, 2024

Does this solve the problem? #414

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants