Replies: 7 comments 5 replies
-
Hi! Thanks for the feedback! I've converted this into a discussion, as there's no real issue with the project itself. I'll see if I can upload a tarball with the test data somewhere and will keep you posted. |
Beta Was this translation helpful? Give feedback.
-
That being said, I think I've used |
Beta Was this translation helpful? Give feedback.
-
FYI, I just checked with the default compression:
So that's indeed pretty much exactly two orders of magnitude faster (in terms of user time) than
|
Beta Was this translation helpful? Give feedback.
-
zpaq (and forks :) have the ability to make deduplicated backup (aka: crontabbed) of very big file (virtual machine's vmdk) or very big but typically static folders (aka: a fileserver), or huge e-mail store server, MySQL dump backup and so on. The use is not so much "launch once on a folder, and that's it", but "launch a thousand times, on the same folder, at different moments" In other zpaq's forks there is the possibility of performing parallel-hashed (multithreaded) file checks with actual speeds, real world, ~1-2GB/s, on non-spinning drives (SSDs and much better on NVMe) If I understand right dwarf (very good software indeed!) cannot update an image, therefore it is (almost) useless for daily, or every 60 minutes, backup purpose. I would really appreciate do some analysis and comparisons on your test-set archive |
Beta Was this translation helpful? Give feedback.
-
In fact the key is here
With a rolling-SHA-1 (you can change the default "chunk" size: the smaller, the better deduplication, but more time is needed) in the first stage the 51GB is shrinked to 8.9GB. Short version: zpaq is NOT something like 7z or RAR (for example), it's something... fundamentally different (of course you can use as 7z: it runs just about as fast, and compress just about as 7z). |
Beta Was this translation helpful? Give feedback.
-
Cannot compare on bare metal (dwarf does not run on BSD or Windows, so a ubuntu virtual machine is used, 16 CPU, 16GB, ~2GB/s media). However ... the testbed: 3 small virtual machines
dwarf with default options
795 s; now zpaqfranz with default options
The archived files are just as big, but zpaqfranz is (in this case) more than 200% faster
Mount and get xxh3 for every file (impressive speed!)
Get xxh3 from "real" file (speed check against the "fake" dwarf)
OK now compare everything (check the global SHA256)
... againts the "real" filesystem
The "dwarf" image is equal to source Short version: |
Beta Was this translation helpful? Give feedback.
-
This is just me jumping in with unrelated stuff but I've got my own example of very redundant test data: every single version of Minecraft put through I've done some further compression of the "uncompressed" dwarfs file of that (the file I've shared is zstd compressed, that other version goes to about 11GiB) using |
Beta Was this translation helpful? Give feedback.
-
I noticed that, during the comparison with zpaq, the "placebo" compression mode (-m5) was used while, in reality, the default one (-m1) is almost always used.
Could you please share the file you used for testing to make some analysis?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions