Document chunking in nccreate #87

ali-ramadhan · 2019-03-19T13:31:05Z

Please let me know if I'm doing this wrong but I was trying to find a nice balance between compression time and file size by benchmarking compression levels 0-9 but instead find that the compression level does nothing.

using NetCDF
N = 256^3
A = rand(N)

for cl in 0:9
    tic = time_ns()
    
    filename = "compress" * string(cl) * ".nc"
    varname  = "rands"
    attribs  = Dict("units"   => "m/s")

    nccreate(filename, varname, "x1", collect(1:N), Dict("units"=>"m"), atts=attribs, compress=cl)
    ncwrite(A, filename, varname)
    ncclose(filename)
    
    toc = time_ns()
    
    ts = prettytime(toc - tic)
    fs = datasize(filesize(filename); style=:bin, format="%.3f")
    println("Compression level $cl: $ts $fs")
end

Compression level 0: 4.784 s 233.989 MiB
Compression level 1: 4.618 s 233.989 MiB
Compression level 2: 4.358 s 233.989 MiB
Compression level 3: 5.900 s 233.989 MiB
Compression level 4: 4.456 s 233.989 MiB
Compression level 5: 4.643 s 233.989 MiB
Compression level 6: 4.353 s 233.989 MiB
Compression level 7: 5.022 s 233.989 MiB
Compression level 8: 6.425 s 233.989 MiB
Compression level 9: 4.271 s 233.989 MiB

The text was updated successfully, but these errors were encountered:

jarlela · 2019-03-19T14:06:53Z

Compression of NetCDF files is only enabled if chunking is also enabled, nccreate uses chunksize=(0,) by default. Try adding , chunksize=(C,) , where C is some integer larger than 1 and smaller than N to your nccreate arguments.

ali-ramadhan · 2019-03-19T14:26:41Z

Thanks for the super quick reply @jarlela! Ah I did not know about chunking (don't think it was in the documentation).

I tried a typical 4 MiB chunk size with chunksize=(4096,) but again noticed no difference between compression level and file size.

I thought maybe it was because I'm saving a very long vector so I'm now trying to save a 3D array. I magically found that it doesn't crash with chunksize=(1,16,256) (still 4 MiB I think?) but even then I still don't see a difference between compression level and file size.

N = 256
A = rand(N, N, N)

for cl in 0:9
    tic = time_ns()
    
    filename = "compress" * string(cl) * ".nc"
    varname  = "rands"
    attribs  = Dict("units"   => "m/s")

    nccreate(filename, varname,
             "x1", collect(1:N), Dict("units"=>"m"),
             "x2", collect(1:N), Dict("units"=>"m"),
             "x3", collect(1:N), Dict("units"=>"m"),
             atts=attribs, chunksize=(1,16,256), compress=cl)
    ncwrite(A, filename, varname)
    ncclose(filename)
    
    toc = time_ns()
    
    ts = prettytime(toc - tic)
    fs = datasize(filesize(filename); style=:bin, format="%.3f")
    println("Compression level $cl: $ts $fs")
end

Compression level 0: 2.996 s 111.172 MiB
Compression level 1: 3.009 s 111.172 MiB
Compression level 2: 3.058 s 111.172 MiB
Compression level 3: 3.047 s 111.172 MiB
Compression level 4: 3.069 s 111.172 MiB
Compression level 5: 3.027 s 111.172 MiB
Compression level 6: 3.029 s 111.172 MiB
Compression level 7: 3.046 s 111.172 MiB
Compression level 8: 3.016 s 111.172 MiB
Compression level 9: 2.969 s 111.172 MiB

meggart · 2019-03-25T11:05:19Z

Thanks for reporting, could you try this branch #88 ?

On a side note, when running your example, you should provide the chunksize in Julia-ordered dimensions, so you probably wanted chunksize = (256,16,1) to align along the first axis, in case you want performance. On the other hand, you don't see a lot of compression in random numbers, so I did something like A = Float64.(rand(1:10,N,N,N) to be able to compress something.

ali-ramadhan · 2019-03-25T16:16:33Z

Hey @meggart thanks for looking into this. Tried #88 and it's working as expected now!

N = 256
A = Float64.(rand(1:10, N, N, N))

for cl in 0:9
    tic = time_ns()
    
    filename = "compress" * string(cl) * ".nc"
    varname  = "rands"
    attribs  = Dict("units"   => "m/s")

    nccreate(filename, varname,
             "x1", collect(1:N), Dict("units"=>"m"),
             "x2", collect(1:N), Dict("units"=>"m"),
             "x3", collect(1:N), Dict("units"=>"m"),
             atts=attribs, chunksize=(256,16,1), compress=cl)
    ncwrite(A, filename, varname)
    ncclose(filename)
    
    toc = time_ns()
    
    ts = prettytime(toc - tic)
    fs = datasize(filesize(filename); style=:bin, format="%.3f")
    println("Compression level $cl: $ts $fs")
end

Compression level 0: 866.125 ms 128.280 MiB
Compression level 1: 930.343 ms 12.469 MiB
Compression level 2: 985.205 ms 12.097 MiB
Compression level 3: 1.078 s 11.716 MiB
Compression level 4: 1.407 s 11.421 MiB
Compression level 5: 1.557 s 11.212 MiB
Compression level 6: 1.918 s 11.010 MiB
Compression level 7: 2.348 s 10.952 MiB
Compression level 8: 4.791 s 10.858 MiB
Compression level 9: 6.997 s 10.843 MiB

bjarthur · 2023-08-03T12:00:08Z

+1 for the suggestion above to document chunking. it's still not mentioned anywhere in docs/ nor in the docstrings, e.g.:

help?> nccreate
search: nccreate

  nccreate (filename, varname, dimensions ...)

  Create a variable in an existing NetCDF file or generates a new file. filename and varname
  are strings. After that follows a list of dimensions. Each dimension entry starts with a
  dimension name (a String), and may be followed by a dimension length, an array with
  dimension values or a Dict containing dimension attributes. Then the next dimension is
  entered and so on. Have a look at examples/high.jl for an example use.

  Keyword arguments
  –––––––––––––––––––

    •  atts Dict of attribute names and values to be assigned to the variable created

    •  gatts Dict of attribute names and values to be written as global attributes

    •  compress Integer [0..9] setting the compression level of the file, only valid if
       mode=NC_NETCDF4

    •  t variable type, currently supported types are: const NC_BYTE, NC_CHAR, NC_SHORT,
       NC_INT, NC_FLOAT, NC_LONG, NC_DOUBLE

    •  mode file creation mode, only valid when new file is created, choose one of:
       NC_NETCDF4, NC_CLASSIC_MODEL, NC_64BIT_OFFSET

also, @meggart, can you please elaborate what did you meant by:

you probably wanted chunksize = (256,16,1) to align along the first axis, in case you want performance

what aspect of performance is improved if the chunk is bigger in the first axis? compression ratio, read time, something else? thanks!

meggart · 2023-08-04T12:32:42Z

Ok, I have re-opened and changed the title of the issue

ali-ramadhan mentioned this issue Mar 19, 2019

Asynchronous NetCDF output. CliMA/Oceananigans.jl#137

Merged

ali-ramadhan mentioned this issue Mar 21, 2019

Faster and more flexible NetCDF IO. CliMA/Oceananigans.jl#145

Closed

4 tasks

ali-ramadhan mentioned this issue Mar 28, 2019

Netcdf chunking for compression to work. CliMA/Oceananigans.jl#156

Closed

visr closed this as completed Apr 3, 2019

meggart reopened this Aug 4, 2023

meggart changed the title ~~Compression level functionality in nccreate does not seem to work.~~ Document chunking in nccreate Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document chunking in nccreate #87

Document chunking in nccreate #87

ali-ramadhan commented Mar 19, 2019

jarlela commented Mar 19, 2019

ali-ramadhan commented Mar 19, 2019

meggart commented Mar 25, 2019

ali-ramadhan commented Mar 25, 2019

bjarthur commented Aug 3, 2023 •

edited

Loading

meggart commented Aug 4, 2023

Document chunking in nccreate #87

Document chunking in nccreate #87

Comments

ali-ramadhan commented Mar 19, 2019

jarlela commented Mar 19, 2019

ali-ramadhan commented Mar 19, 2019

meggart commented Mar 25, 2019

ali-ramadhan commented Mar 25, 2019

bjarthur commented Aug 3, 2023 • edited Loading

meggart commented Aug 4, 2023

bjarthur commented Aug 3, 2023 •

edited

Loading