-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous NetCDF output. #137
Conversation
+1 for NCDatasets. Asynchronous output writing is great; we will also want this (or at least a similar concept) for high-frequency diagnostics gathering + occasional writing of gathered diagnostics. I'll open an issue later today once I hammer out the requirements for this functionality. |
It's pretty bad. I don't think the compression level functionality even does anything! using NetCDF
N = 256^3
A = rand(N)
for cl in 0:9
tic = time_ns()
filename = "compress" * string(cl) * ".nc"
varname = "rands"
attribs = Dict("units" => "m/s")
nccreate(filename, varname, "x1", collect(1:N), Dict("units"=>"m"), atts=attribs, compress=cl)
ncwrite(A, filename, varname)
ncclose(filename)
toc = time_ns()
ts = prettytime(toc - tic)
fs = datasize(filesize(filename); style=:bin, format="%.3f")
println("Compression level $cl: $ts $fs")
end
I opened an issue (JuliaGeo/NetCDF.jl#87) but it seems like an inactive package so not expecting a reply. Either way I need to figure out why CI is crapping out. Might be the use of the Distributed package. |
Tests are passing so I will merge unless there are any objections. Code coverage decreased but I'm not sure how to test this as it would require spinning up another worker on CI (not available to us as we're on the free-tier CI plans). |
Codecov Report
@@ Coverage Diff @@
## master #137 +/- ##
==========================================
- Coverage 56.78% 56.03% -0.76%
==========================================
Files 19 19
Lines 597 605 +8
==========================================
Hits 339 339
- Misses 258 266 +8
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #137 +/- ##
==========================================
- Coverage 56.78% 56.03% -0.76%
==========================================
Files 19 19
Lines 597 605 +8
==========================================
Hits 339 339
- Misses 258 266 +8
Continue to review full report at Codecov.
|
This PR adds an option
async::Bool
to theNetCDFOutputWriter
. By defaultasync=false
. Whenasync=true
the data to be written to disk is copied and is asynchronously written to disk by a second worker/process. This is especially useful when running models on the GPU as writing and compressing NetCDF output is time consuming and the GPU is just idle while this is happening.In particular, compressing NetCDF seems to be very inefficient. It takes way too long. I'm considering setting
compress=0
or trying out NCDatasets.jl. We can also just write output to JLD then convert all the JLD files to one big NetCDF after the model is done time stepping.See the free convection example to see this feature in action.
I tried to keep all the parallel computing stuff under the hood but unfortunately I could not get this feature to work without manually calling
addprocs()
and@everywhere using Oceananigans
in the example, so for now it requires some work on the user's part.cc: @SandreOuza
Resolves #124.