-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multitaksing to DataLoader #13
Comments
One thing I'd like to do here is separate the view from the loader. So instead of |
So the way DataLoaders.jl does this, it has Nothing wrong with Flux's |
Let me know if you need help on how to move forward with this. |
What I wanted to do was have a single BufferedGetObs that supports multiple slots for parallelism and uses n = 1 for the single threaded-case. Instead of having a duplicate type for parallel vs single. |
Courtesy of @samuela on Discourse, https://ffcv.io/ is a data loading library which appears to have insane performance. There might be some ideas in there worth emulating here. |
I can't figure out from their website what they are actually doing. Any idea what's the secret sauce? I saw MetaTheory.jl mentioned on discourse, but I don't see anything related to it on the website. |
Yeah, I'm not sure exactly what FFCV's secrets are yet... I guess we'll have to wait on the paper. From what I can piece together it sounds like they're using numba to JIT compile data augmentations and data loading to make things faster. Additionally there appears to be some async element to it all which makes sense considering hitting disk, etc. Not sure exactly what's going on, but it seems like the sort of things Julia should excel at. |
Yeah we'll have to benchmark ourselves, but sounds like the kind of things that should happen "for free" with Julia. |
As @samuela and @darsnack mention, some of their data transformation functions that they optimize with What will require a little more work to replicate is likely the data More details seem to be in https://docs.ffcv.io/parameter_tuning.html. Maybe we just need to use MemPool.jl with more specializations than just |
A big but under-discussed part of the performance gains come from the dataset generation step. AFAICT the |
As @ToucheSir says, preprocessing the dataset for faster online loading can make a big difference, see for example this FastAI.jl tutorial. I also have a prototype package that automates many steps of saving and loading data containers that lets you dispatch based on the kind of data using Next to serializing data in the most efficient (to load) format, there are also other avenues for speedups. I'm planning to do some further testing and benchmarking on these. Focusing on image pipelines here since these require a lot of optimization, but safe to say the package infrastructure will help for other domains. One is inplace loading through Another one is JpegTurbo.jl for faster image loading. As I said, I plan to do some more benchmarks and compare with Python-based projects, but I think we're in a good spot for great performance and good interfaces to extend this to other domains as well. |
JpegTurbo.jl is now released and I've went ahead and did some benchmarking. The performance is looking good, great work @johnnychen94! |
I'm wondering if the new new lossless QOI format gives a better result here? It's faster in decoding, and it's a lossless format so you don't introduce JPEG artifacts. https://github.com/KristofferC/QOI.jl is included in ImageIO already. It could be made even faster with avx enabled. using ImageIO, TestImages, FileIO
using BenchmarkTools
img = testimage("lighthouse");
@btime save("tmp.qoi", $img); # 7.511 ms (60 allocations: 3.35 MiB)
@btime load("tmp.qoi"); # 3.082 ms (57 allocations: 2.88 MiB)
@btime save("tmp.jpg", $img); # 2.422 ms (45 allocations: 1.25 MiB)
@btime load("tmp.jpg"); # 6.424 ms (66 allocations: 2.38 MiB) |
Didn't know there was a Julia implementation already. Will update here once I have tried. Do you know what the compression ratio is like compared to JPEGs? |
It's becoming off-topic now... In general, QOI works not that well in compression ratio, but works really well at encoding and decoding. Thus QOI is an ideal format for applications that don't need high network bandwidth (e.g., web applications) but requires high decoding throughput (e.g., games). You can find some reference results on https://github.com/KristofferC/QOI.jl#benchmarks and JuliaIO/JpegTurbo.jl#15 (comment) For instance, the "coffee" image is compressed into 78.10KB by JpegTurbo.jl, while 493.3KB by QOI.jl. |
Port the DataLoader from Flux and extend it with the multitasking features of
https://github.com/lorenzoh/DataLoaders.jl
The text was updated successfully, but these errors were encountered: