Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crazy number of allocations on file read #273

Closed
tiemvanderdeure opened this issue Jan 8, 2025 · 6 comments
Closed

crazy number of allocations on file read #273

tiemvanderdeure opened this issue Jan 8, 2025 · 6 comments

Comments

@tiemvanderdeure
Copy link

I'm trying to read the E-OBS weather dataset and reading in the data is extremely slow and generates millions of allocations.

For this file I am getting:

ds = NCDataset(path)
@time ds["hu"][:,:,1:20];

0.800117 seconds (8.29 M allocations: 182.748 MiB, 26.05% gc time)

There seems to be some type instability in CFtransformdata! in CommonDataModel, but just nc_get_vars! also takes half the time.

This is on NCDatasets 0.14.6 and CommonDataModel 0.3.7

image

@Alexander-Barth
Copy link
Member

How many allocations do you get when you run it twice (to make sure that we exclude the allocation of the julia compiler)?

ds = NCDataset(path)
@time ds["hu"][:,:,1:20];
@time ds["hu"][:,:,1:20];

See also #251 (comment)
and https://juliageo.org/NCDatasets.jl/dev/performance/#performance_tips
for the type instability.

@tiemvanderdeure
Copy link
Author

What I posted was without compilation

julia> @time ds["hu"][:,:,1:20];
  4.510533 seconds (11.76 M allocations: 404.837 MiB, 5.94% gc time, 69.71% compilation time)

julia> @time ds["hu"][:,:,1:20];
  1.865391 seconds (8.29 M allocations: 182.748 MiB, 13.99% gc time)

@tiemvanderdeure
Copy link
Author

tiemvanderdeure commented Jan 16, 2025

Is it possible that this has to do with how missing values are handled. This is much faster as the documentation suggests.

A = Array{Float32, 3}(undef, 705, 465, 20);
@time NCDatasets.load!(variable(ds,"hu"),A,:,:,1:20);

But this never calls CFTransformdata!, which is where the type instability is.

@tiemvanderdeure
Copy link
Author

Okay I think I figured it out. If I remove the @inline annotation on CommonDataModel.CFtransformdata! then this is fixed. Probably we need the function barrier. So it's a CommonDataModel problem not an NCDatasets problem.

@Alexander-Barth
Copy link
Member

Good catch!

@Alexander-Barth
Copy link
Member

Thanks again! Fixed via JuliaGeo/CommonDataModel.jl#29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants