Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multitaksing to DataLoader #13

Closed
CarloLucibello opened this issue Dec 31, 2021 · 15 comments · Fixed by #22 or #82
Closed

add multitaksing to DataLoader #13

CarloLucibello opened this issue Dec 31, 2021 · 15 comments · Fixed by #22 or #82

Comments

@CarloLucibello
Copy link
Member

CarloLucibello commented Dec 31, 2021

Port the DataLoader from Flux and extend it with the multitasking features of
https://github.com/lorenzoh/DataLoaders.jl

@darsnack
Copy link
Member

darsnack commented Jan 1, 2022

One thing I'd like to do here is separate the view from the loader. So instead of eachbatchparallel(data), we have loadparallel(eachbatch(data)) where loadparallel accepts any iterator.

@lorenzoh
Copy link
Contributor

lorenzoh commented Jan 11, 2022

So the way DataLoaders.jl does this, it has batchviewcollated which is just a view of collated batches (also supporting getobs!) and then eachobsparallel([;buffered]) that takes any data container.

Nothing wrong with Flux's DataLoader, but I think DataLoaders.DataLoader may make more sense to include since it already integrates very well with the data container interface. Is there something specific we would need from Flux's implementation?

@lorenzoh
Copy link
Contributor

Let me know if you need help on how to move forward with this.

@darsnack
Copy link
Member

What I wanted to do was have a single BufferedGetObs that supports multiple slots for parallelism and uses n = 1 for the single threaded-case. Instead of having a duplicate type for parallel vs single.

@ToucheSir
Copy link
Contributor

Courtesy of @samuela on Discourse, https://ffcv.io/ is a data loading library which appears to have insane performance. There might be some ideas in there worth emulating here.

@darsnack
Copy link
Member

I can't figure out from their website what they are actually doing. Any idea what's the secret sauce? I saw MetaTheory.jl mentioned on discourse, but I don't see anything related to it on the website.

@samuela
Copy link

samuela commented Jan 19, 2022

Yeah, I'm not sure exactly what FFCV's secrets are yet... I guess we'll have to wait on the paper. From what I can piece together it sounds like they're using numba to JIT compile data augmentations and data loading to make things faster. Additionally there appears to be some async element to it all which makes sense considering hitting disk, etc. Not sure exactly what's going on, but it seems like the sort of things Julia should excel at.

@darsnack
Copy link
Member

Yeah we'll have to benchmark ourselves, but sounds like the kind of things that should happen "for free" with Julia.

@CarloLucibello CarloLucibello changed the title add DataLoader add multitaksing to DataLoader Jan 30, 2022
@rejuvyesh
Copy link

rejuvyesh commented Feb 4, 2022

I can't figure out from their website what they are actually doing. Any idea what's the secret sauce?

As @samuela and @darsnack mention, some of their data transformation functions that they optimize with numba will likely happen for free with Julia or at best might require some @tturbo magic from LV.

What will require a little more work to replicate is likely the data cache mechanism. For small datasets (or when the user has enough RAM) they preload the entire dataset in memory.
For larger datasets they use QUASI_RANDOM shuffling to minimize reads on the underlying storage.

More details seem to be in https://docs.ffcv.io/parameter_tuning.html.

Maybe we just need to use MemPool.jl with more specializations than just Array(::String) that it has right now.

@ToucheSir
Copy link
Contributor

A big but under-discussed part of the performance gains come from the dataset generation step. AFAICT the .beton file format is a packed, binary one which is conducive to quick reads and scans. They also have functionality to selectively store a subset of images as compressed (jpeg) format and the rest as raw data. All this would probably be even easier to accomplish in Julia (ideally with Arrow.jl to keep close to standards), but to my knowledge it has yet to be done.

@lorenzoh
Copy link
Contributor

lorenzoh commented Feb 5, 2022

As @ToucheSir says, preprocessing the dataset for faster online loading can make a big difference, see for example this FastAI.jl tutorial. I also have a prototype package that automates many steps of saving and loading data containers that lets you dispatch based on the kind of data using Block types. For example, Image{2}s are saved as individual .jpg files while Keypoints are stored and loaded as a contiguous array using Arrow.jl. This also composes nicely when you have a dataset with different kinds of informations (e.g. inputs, targets, and more) and want to save each in the most efficient format while reusing as much functionality as possible. Some design thoughts could be taken from the repo, https://github.com/lorenzoh/DataBlockStorage.jl.

Next to serializing data in the most efficient (to load) format, there are also other avenues for speedups. I'm planning to do some further testing and benchmarking on these. Focusing on image pipelines here since these require a lot of optimization, but safe to say the package infrastructure will help for other domains.

One is inplace loading through getobs! support, which is supported by DataLoaders.jl's BatchViewCollated. Given that getobs! can be implemented without allocations, this allows 0-allocation data loading, reducing memory load as well as frequency of GC pauses. DataAugmentation.jl's image (and segmentation mask) transforms don't just compose, but also allow inplace application, so this composes nicely with getobs!.

Another one is JpegTurbo.jl for faster image loading.

As I said, I plan to do some more benchmarks and compare with Python-based projects, but I think we're in a good spot for great performance and good interfaces to extend this to other domains as well.

@lorenzoh
Copy link
Contributor

lorenzoh commented Feb 7, 2022

JpegTurbo.jl is now released and I've went ahead and did some benchmarking. The performance is looking good, great work @johnnychen94!

JuliaIO/JpegTurbo.jl#15 (comment)

@johnnychen94
Copy link
Member

johnnychen94 commented Feb 7, 2022

For example, Image{2}s are saved as individual .jpg files

I'm wondering if the new new lossless QOI format gives a better result here? It's faster in decoding, and it's a lossless format so you don't introduce JPEG artifacts.

https://github.com/KristofferC/QOI.jl is included in ImageIO already. It could be made even faster with avx enabled.

using ImageIO, TestImages, FileIO
using BenchmarkTools

img = testimage("lighthouse");
@btime save("tmp.qoi", $img); # 7.511 ms (60 allocations: 3.35 MiB)
@btime load("tmp.qoi"); # 3.082 ms (57 allocations: 2.88 MiB)

@btime save("tmp.jpg", $img); # 2.422 ms (45 allocations: 1.25 MiB)
@btime load("tmp.jpg"); # 6.424 ms (66 allocations: 2.38 MiB)

@lorenzoh
Copy link
Contributor

lorenzoh commented Feb 7, 2022

I'm wondering if the new new lossless QOI format gives a better result here?

Didn't know there was a Julia implementation already. Will update here once I have tried. Do you know what the compression ratio is like compared to JPEGs?

@johnnychen94
Copy link
Member

Do you know what the compression ratio is like compared to JPEGs?

It's becoming off-topic now... In general, QOI works not that well in compression ratio, but works really well at encoding and decoding. Thus QOI is an ideal format for applications that don't need high network bandwidth (e.g., web applications) but requires high decoding throughput (e.g., games).

You can find some reference results on https://github.com/KristofferC/QOI.jl#benchmarks and JuliaIO/JpegTurbo.jl#15 (comment) For instance, the "coffee" image is compressed into 78.10KB by JpegTurbo.jl, while 493.3KB by QOI.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants