add multitaksing to DataLoader #13

CarloLucibello · 2021-12-31T17:33:24Z

Port the DataLoader from Flux and extend it with the multitasking features of
https://github.com/lorenzoh/DataLoaders.jl

darsnack · 2022-01-01T13:42:05Z

One thing I'd like to do here is separate the view from the loader. So instead of eachbatchparallel(data), we have loadparallel(eachbatch(data)) where loadparallel accepts any iterator.

lorenzoh · 2022-01-11T09:42:53Z

So the way DataLoaders.jl does this, it has batchviewcollated which is just a view of collated batches (also supporting getobs!) and then eachobsparallel([;buffered]) that takes any data container.

Nothing wrong with Flux's DataLoader, but I think DataLoaders.DataLoader may make more sense to include since it already integrates very well with the data container interface. Is there something specific we would need from Flux's implementation?

lorenzoh · 2022-01-11T09:43:06Z

Let me know if you need help on how to move forward with this.

darsnack · 2022-01-11T14:26:23Z

What I wanted to do was have a single BufferedGetObs that supports multiple slots for parallelism and uses n = 1 for the single threaded-case. Instead of having a duplicate type for parallel vs single.

ToucheSir · 2022-01-19T02:54:57Z

Courtesy of @samuela on Discourse, https://ffcv.io/ is a data loading library which appears to have insane performance. There might be some ideas in there worth emulating here.

darsnack · 2022-01-19T15:51:00Z

I can't figure out from their website what they are actually doing. Any idea what's the secret sauce? I saw MetaTheory.jl mentioned on discourse, but I don't see anything related to it on the website.

samuela · 2022-01-19T19:49:49Z

Yeah, I'm not sure exactly what FFCV's secrets are yet... I guess we'll have to wait on the paper. From what I can piece together it sounds like they're using numba to JIT compile data augmentations and data loading to make things faster. Additionally there appears to be some async element to it all which makes sense considering hitting disk, etc. Not sure exactly what's going on, but it seems like the sort of things Julia should excel at.

darsnack · 2022-01-19T19:50:57Z

Yeah we'll have to benchmark ourselves, but sounds like the kind of things that should happen "for free" with Julia.

rejuvyesh · 2022-02-04T23:23:32Z

I can't figure out from their website what they are actually doing. Any idea what's the secret sauce?

As @samuela and @darsnack mention, some of their data transformation functions that they optimize with numba will likely happen for free with Julia or at best might require some @tturbo magic from LV.

What will require a little more work to replicate is likely the data cache mechanism. For small datasets (or when the user has enough RAM) they preload the entire dataset in memory.
For larger datasets they use QUASI_RANDOM shuffling to minimize reads on the underlying storage.

More details seem to be in https://docs.ffcv.io/parameter_tuning.html.

Maybe we just need to use MemPool.jl with more specializations than just Array(::String) that it has right now.

ToucheSir · 2022-02-05T00:49:23Z

A big but under-discussed part of the performance gains come from the dataset generation step. AFAICT the .beton file format is a packed, binary one which is conducive to quick reads and scans. They also have functionality to selectively store a subset of images as compressed (jpeg) format and the rest as raw data. All this would probably be even easier to accomplish in Julia (ideally with Arrow.jl to keep close to standards), but to my knowledge it has yet to be done.

lorenzoh · 2022-02-05T18:28:39Z

As @ToucheSir says, preprocessing the dataset for faster online loading can make a big difference, see for example this FastAI.jl tutorial. I also have a prototype package that automates many steps of saving and loading data containers that lets you dispatch based on the kind of data using Block types. For example, Image{2}s are saved as individual .jpg files while Keypoints are stored and loaded as a contiguous array using Arrow.jl. This also composes nicely when you have a dataset with different kinds of informations (e.g. inputs, targets, and more) and want to save each in the most efficient format while reusing as much functionality as possible. Some design thoughts could be taken from the repo, https://github.com/lorenzoh/DataBlockStorage.jl.

Next to serializing data in the most efficient (to load) format, there are also other avenues for speedups. I'm planning to do some further testing and benchmarking on these. Focusing on image pipelines here since these require a lot of optimization, but safe to say the package infrastructure will help for other domains.

One is inplace loading through getobs! support, which is supported by DataLoaders.jl's BatchViewCollated. Given that getobs! can be implemented without allocations, this allows 0-allocation data loading, reducing memory load as well as frequency of GC pauses. DataAugmentation.jl's image (and segmentation mask) transforms don't just compose, but also allow inplace application, so this composes nicely with getobs!.

Another one is JpegTurbo.jl for faster image loading.

As I said, I plan to do some more benchmarks and compare with Python-based projects, but I think we're in a good spot for great performance and good interfaces to extend this to other domains as well.

lorenzoh · 2022-02-07T15:12:32Z

JpegTurbo.jl is now released and I've went ahead and did some benchmarking. The performance is looking good, great work @johnnychen94!

JuliaIO/JpegTurbo.jl#15 (comment)

johnnychen94 · 2022-02-07T18:18:35Z

For example, Image{2}s are saved as individual .jpg files

I'm wondering if the new new lossless QOI format gives a better result here? It's faster in decoding, and it's a lossless format so you don't introduce JPEG artifacts.

https://github.com/KristofferC/QOI.jl is included in ImageIO already. It could be made even faster with avx enabled.

using ImageIO, TestImages, FileIO
using BenchmarkTools

img = testimage("lighthouse");
@btime save("tmp.qoi", $img); # 7.511 ms (60 allocations: 3.35 MiB)
@btime load("tmp.qoi"); # 3.082 ms (57 allocations: 2.88 MiB)

@btime save("tmp.jpg", $img); # 2.422 ms (45 allocations: 1.25 MiB)
@btime load("tmp.jpg"); # 6.424 ms (66 allocations: 2.38 MiB)

lorenzoh · 2022-02-07T19:41:51Z

I'm wondering if the new new lossless QOI format gives a better result here?

Didn't know there was a Julia implementation already. Will update here once I have tried. Do you know what the compression ratio is like compared to JPEGs?

johnnychen94 · 2022-02-07T20:19:07Z

Do you know what the compression ratio is like compared to JPEGs?

It's becoming off-topic now... In general, QOI works not that well in compression ratio, but works really well at encoding and decoding. Thus QOI is an ideal format for applications that don't need high network bandwidth (e.g., web applications) but requires high decoding throughput (e.g., games).

You can find some reference results on https://github.com/KristofferC/QOI.jl#benchmarks and JuliaIO/JpegTurbo.jl#15 (comment) For instance, the "coffee" image is compressed into 78.10KB by JpegTurbo.jl, while 493.3KB by QOI.jl.

CarloLucibello mentioned this issue Jan 30, 2022

port DataLoader from Flux #22

Merged

CarloLucibello closed this as completed in #22 Jan 30, 2022

CarloLucibello reopened this Jan 30, 2022

CarloLucibello changed the title ~~add DataLoader~~ add multitaksing to DataLoader Jan 30, 2022

darsnack mentioned this issue Feb 23, 2022

Redesign package to be built on top of reusable dataset containers JuliaML/MLDatasets.jl#96

Merged

8 tasks

CarloLucibello mentioned this issue May 22, 2022

Add parallel and shuffle support to eachobs and DataLoader #82

Merged

CarloLucibello closed this as completed in #82 May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multitaksing to DataLoader #13

add multitaksing to DataLoader #13

CarloLucibello commented Dec 31, 2021 •

edited

Loading

darsnack commented Jan 1, 2022

lorenzoh commented Jan 11, 2022 •

edited

Loading

lorenzoh commented Jan 11, 2022

darsnack commented Jan 11, 2022

ToucheSir commented Jan 19, 2022

darsnack commented Jan 19, 2022

samuela commented Jan 19, 2022

darsnack commented Jan 19, 2022

rejuvyesh commented Feb 4, 2022 •

edited

Loading

ToucheSir commented Feb 5, 2022

lorenzoh commented Feb 5, 2022 •

edited

Loading

lorenzoh commented Feb 7, 2022

johnnychen94 commented Feb 7, 2022 •

edited

Loading

lorenzoh commented Feb 7, 2022

johnnychen94 commented Feb 7, 2022

add multitaksing to DataLoader #13

add multitaksing to DataLoader #13

Comments

CarloLucibello commented Dec 31, 2021 • edited Loading

darsnack commented Jan 1, 2022

lorenzoh commented Jan 11, 2022 • edited Loading

lorenzoh commented Jan 11, 2022

darsnack commented Jan 11, 2022

ToucheSir commented Jan 19, 2022

darsnack commented Jan 19, 2022

samuela commented Jan 19, 2022

darsnack commented Jan 19, 2022

rejuvyesh commented Feb 4, 2022 • edited Loading

ToucheSir commented Feb 5, 2022

lorenzoh commented Feb 5, 2022 • edited Loading

lorenzoh commented Feb 7, 2022

johnnychen94 commented Feb 7, 2022 • edited Loading

lorenzoh commented Feb 7, 2022

johnnychen94 commented Feb 7, 2022

CarloLucibello commented Dec 31, 2021 •

edited

Loading

lorenzoh commented Jan 11, 2022 •

edited

Loading

rejuvyesh commented Feb 4, 2022 •

edited

Loading

lorenzoh commented Feb 5, 2022 •

edited

Loading

johnnychen94 commented Feb 7, 2022 •

edited

Loading