Regarding long-term archival of DwarFS images #23

cipriancraciun · 2020-12-12T07:50:50Z

cipriancraciun
Dec 12, 2020

Due to the fact that DwarFS has excelent compression, and allows one to easily offload large datasets (especially for text files) but still be able to easily access them, it looks as a perfect candidate for long term archival.

So the question is if DwarFS images are suitable for such a use case?

For example here are a few items that I believe are important:

Is the binary format thoroughly documented and easily implemented in the future? (Given the Thrift dependency, my assumption is that the same question applies to it.)
Is the image resistant to bit-rot? At least can it detect if any bits were flipped? Or even better can it detect and recover? (I know hardware should do this, but in the end I think it does happen.)
Is he image resistant to say one block or a few random blocks being overridden with zero? (This might happen for example when a HDD block goes bad and can't be recovered, thus it's written with zeros to be remapped.)

Personally I would say that it's important to at least have a simple format, and to include strong hashes to detect corruption.

For example the author of Lzip has two nice articles about the topic of long-term archival of compressed bundles:

https://www.nongnu.org/lzip/manual/lzip_manual.html#Quality-assurance
https://www.nongnu.org/lzip/manual/lzip_manual.html#File-format
https://www.nongnu.org/lzip/xz_inadequate.html -- I especially recommend this on the topic;

mhx · 2020-12-12T09:13:34Z

mhx
Dec 12, 2020
Maintainer

This is really interesting, thanks for the links!

Just quick:

Is the binary format thoroughly documented and easily implemented in the future? (Given the Thrift dependency, my assumption is that the same question applies to it.)

I've moved to frozen (thrift) precisely because the previous metadata format was almost impossible to change in a backwards-compatible way. It's a two-edged sword, though, because of the extra dependency. It definitely made the code much simpler, certainly less buggy, way easier to debug, and so much easier to extend. If fbthrift dies one day, well, I'd probably rip out the frozen stuff. (Actually, the metadata schema also uses thrift.)

Frozen itself is actually a simple format, although looking at it with e.g. a hex editor is pointless due to the bit packing and recovery by manually "fixing bits" is probably close to impossible. But under the hood, it's really just bit offsets, bit widths and lengths.

Is the image resistant to bit-rot?

Not yet, though it's something I've thought about at least for the metadata block and the section headers. Without the metadata block, there's not much you can extract anymore. If the metadata is intact, you can lose individual data blocks and still recover (at least in theory) data that's stored in the blocks that are still valid. Unless a file is larger than a single block, it'll never be spread out across more than 3 blocks.

At least can it detect if any bits were flipped? Or even better can it detect and recover?

None of these at the moment, unless the underlying compression algorithm actually detects an error.

0 replies

mhx · 2020-12-14T20:57:11Z

mhx
Dec 14, 2020
Maintainer

More thoughts:

My gut feeling is that DwarFS should focus on error detection, but not correction. Error correction requires you to know the typical failure modes upfront, e.g. scratches on optical media, in order to pick a reasonable approach. I don't think this makes sense for files where you essentially have no idea what the failure mode are going to be. Surely, random bit flips could be dealt with, but I guess more likely would be things like data loss due to accidental deletion. I'm happy to be convinced otherwise, though!
One of the next releases will definitely add block checksums and a more robust structure to make sure that e.g. partially overwritten images can be dealt with.
The most interesting bit is probably the one about the dependency on fbthrift and "what if I want to decode the image in 50 years". The only really complex parts of the image are a) the compressed blocks and b) the metadata. The remaining structure is trivial. With a) it's just a matter of having a compression algorithm that's going to stand the test of time. My gut feeling here is that right now ZSTD is so much ahead of the competition that chances are pretty good it'll stick around. The only thing that has a dependency on fbthrift is b). However, the metadata itself is nicely structured and can very easily be represented in e.g. JSON, which I'm pretty certain will also stick around for a while. So what I'll definitely add in an upcoming release is the ability to export the metadata of a DwarFS image as a JSON object. This gives you the opportunity to back up the metadata separately, perhaps in more than one location. With only the JSON metadata and the compression algorithm used for the file system image blocks, you'll always be able to recover the data quite easily.

0 replies

cipriancraciun · 2020-12-14T23:31:04Z

cipriancraciun
Dec 14, 2020
Author

My gut feeling is that DwarFS should focus on error detection, but not correction.

In general I agree. However the documentation should perhaps state a few options. (For example at some point in time there was PAR? or something like that which used to create some "recovery" files.)

My gut feeling here is that right now ZSTD is so much ahead of the competition that chances are pretty good it'll stick around.

For long term archival I would personally go with something more "classical" like perhaps gzip, thus trading storage for recoverability.

So what I'll definitely add in an upcoming release is the ability to export the metadata of a DwarFS image as a JSON object. This gives you the opportunity to back up the metadata separately, perhaps in more than one location. With only the JSON metadata and the compression algorithm used for the file system image blocks, you'll always be able to recover the data quite easily.

This is a very good idea. My assumption is that for each "file" / "folder" you'll export a JSON object holding perhaps the offsets inside the original image file where the data actually is found (obviously taking into account the compression).

Perhaps those JSON objects could also contain the md5 / sha1 and sha2-256 of the file contents (which can be computed at export). (I say all these three hashes, because the first two could be useful if one has older hashes and wants to easily find things in archives.)

On this topic, given that archived images can contain many (millions) of entries, I would suggest instead of outputing one large JSON file, to instead output in JSON-stream format, i.e. one JSON object per entry, each such JSON written on a separate line. This way one could easily use jq or Python or anything else to filter the archive.

All that remains afterwards is a small sample C code that given as argument some fields from that JSON, and an image path, it's able to stream to stdout the contents of that file. This could serve both as documentation, and as a way to extract single files from the image.

2 replies

tonluong Oct 27, 2021

par2 is great tool still used in various circles. It verifies and recovers up to a preset recovery level (up to 100%). Mapping those 2 features to DwarFS gives you

Verify on Mount ...
Recovery data can be stored at the end of the DwarFS file structure, recovery level can be increase / decrease dynamically

mhx Oct 27, 2021
Maintainer

Fully agree, par2 is a great tool and in fact I already recommeded it for use with DwarFS.

Verify on mount is something that would already be easily possible. I'm not entirely sure if I'd want to integrate the redundancy information into the file system image, but maybe I'm just paranoid. :) Also, I'm not entirely sure if this would play nicely with appendable images (something that's planned for the future).

mhx · 2020-12-15T00:07:38Z

mhx
Dec 15, 2020
Maintainer

This is a very good idea. My assumption is that for each "file" / "folder" you'll export a JSON object holding perhaps the offsets inside the original image file where the data actually is found (obviously taking into account the compression).
Perhaps those JSON objects could also contain the md5 / sha1 and sha2-256 of the file contents (which can be computed at export). (I say all these three hashes, because the first two could be useful if one has older hashes and wants to easily find things in archives.)

While some of this is tempting, it can easily defeat the purpose of "compressing" the data by totally blowing up the JSON metadata. For example, the metadata for the cdnjs repo is 1.4 GiB in condensed (i.e. no extra spaces and space-efficient structure) JSON. That's 20% of the DwarFS image. Using a streaming format or adding checksums would likely make this larger than the actual image.

Another advantage of actually leaving the metadata in the exact same space-efficient structure that is being used internally (just a different representation) is that it makes it trivial to recover a corrupted metadata block.

2 replies

cipriancraciun Dec 15, 2020
Author

I understand your concern. Perhaps a small tool that generates the JSON on-the-fly from the image or from the "metadata-backup"?

(Am I reading correctly that you think about adding an option that given a DwarFS image, it creates another file that contains only the meta-data?)

mhx Dec 15, 2020
Maintainer

(Am I reading correctly that you think about adding an option that given a DwarFS image, it creates another file that contains only the meta-data?)

It's already in the next-release branch:

dwarfsck --export-metadata metadata.json image.dwarfs

cipriancraciun · 2020-12-15T00:16:28Z

cipriancraciun
Dec 15, 2020
Author

Given that from the documentation I understand that the image is composed of three sections (schema, metadata and actual data), might I suggest adding one strong hash (say SHA2-256) for each of these sections, thus the fsck tool could first check if these hashes match, and then continue with the rest? Perhaps also add a hash for each of the data blocks?

(I've suggested something similar in your initial comment, however it didn't give details.)

0 replies

mhx · 2020-12-15T17:26:48Z

mhx
Dec 15, 2020
Maintainer

Block/schema/metadata are the section types. There are multiple block sections, but only one each of schema & metadata. The schema isn't needed for the exported JSON metadata, it's only necessary to interpret the bit-packing in the frozen metadata.

The plan is indeed for each section to have its own checksum, and quite likely that's going to be SHA256.

0 replies

mhx · 2020-12-18T12:12:47Z

mhx
Dec 18, 2020
Maintainer

cde36cf introduces the new section header, which adds a couple of things:

Instead of having the magic just once in the file system header, it's now in every section header (and the file system header is dropped completely). This has a two advantages in case of file system corruption:
1. Sections can be found by searching for the magic and performing checksum verification
2. The file system version is stored redundantly and can be recovered from any section header
Sections are now explicitly numbered, so blocks can be correctly addressed even if certain blocks have been completely corrupted.
Each section now has two checksums:
1. SHA2-512/256, to make sure the sections are covered by a proven, robust, secure hash. This is used by dwarfsck. Why this particular one? It turned out to be the fastest of the bunch.
2. XXH3-64, because it's almost two orders of magnitude faster than any SHA*. This is used every time a section is read.

You can easily convert an old image to the new format in a few seconds using:

mkdwarfs -i old.dwarfs -o new.dwarfs --recompress none

0 replies

mhx · 2021-04-01T15:58:45Z

mhx
Apr 1, 2021
Maintainer

FWIW, here is a brief description of the general filesystem format and in particular the metadata format.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding long-term archival of DwarFS images #23

{{title}}

Replies: 8 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Regarding long-term archival of DwarFS images #23

cipriancraciun Dec 12, 2020

Replies: 8 comments · 4 replies

mhx Dec 12, 2020 Maintainer

mhx Dec 14, 2020 Maintainer

cipriancraciun Dec 14, 2020 Author

tonluong Oct 27, 2021

mhx Oct 27, 2021 Maintainer

mhx Dec 15, 2020 Maintainer

cipriancraciun Dec 15, 2020 Author

mhx Dec 15, 2020 Maintainer

cipriancraciun Dec 15, 2020 Author

mhx Dec 15, 2020 Maintainer

mhx Dec 18, 2020 Maintainer

mhx Apr 1, 2021 Maintainer

cipriancraciun
Dec 12, 2020

Replies: 8 comments 4 replies

mhx
Dec 12, 2020
Maintainer

mhx
Dec 14, 2020
Maintainer

cipriancraciun
Dec 14, 2020
Author

mhx Oct 27, 2021
Maintainer

mhx
Dec 15, 2020
Maintainer

cipriancraciun Dec 15, 2020
Author

mhx Dec 15, 2020
Maintainer

cipriancraciun
Dec 15, 2020
Author

mhx
Dec 15, 2020
Maintainer

mhx
Dec 18, 2020
Maintainer

mhx
Apr 1, 2021
Maintainer