Skip to content

Latest commit

 

History

History
104 lines (89 loc) · 4.48 KB

worklist.org

File metadata and controls

104 lines (89 loc) · 4.48 KB

Usage (so far)

find . -type f -print0 | xargs -0 sha256sum | tee ~/recursive-sha256sum.txt
find . -type f -print0 | xargs -0 sha256sum | tee ~/4Tera-recursive-sha256sum.txt
find . -type f -print0 | xargs -0 sha256sum | tee ~/8Tera-recursive-sha256sum.txt

Then run `ban` on the generated files:

python3 ban.py ~/4Tera-recursive-sha256sum.txt ~/8Tera-recursive-sha256sum.txt

Usecase: creating a backup of a folder

I have a folder A and I just copied it to a backup location B. The two folders should be absolutely the same, right? How this tool should check this:

  • are all files that are present in A also present in B?
  • are all files that are present in B also present in A?
  • are all the contents of all the files the same?

Usecase: comparing two snapshots

I have a source folder. At some point in the past I created a backup A of the source folder and placed it on an external drive. As time passed, I added new files to my source folder and modified some of the existing files. Then I make another backup B of the source folder and place it onto another external drive.

Now I wish to see what changed in between these backups. Comparing the two backups helps to detect corruption on either of the external backup drives or on the source drive. For example, after the backup A was done, cosmic rays corrupted a random file on the source drive. Comparing the backup A and backup B shows a difference in that file. Once I see that the file changed, I can inspect and hopefully detect the corruption.

31 Jan 2024

I just plugged in the primary Seagate drive into Windows and Windows seems to have repaired it! WTF man. This is the drive that triggered all that panicky backups a month ago.

27 Dec 2024

It looks like I’ve used `sha256sum` always in text mode. But the internet suggests that digital preservations endeavours should always use the binary reading mode. Sources:

But, the Stack Overflow suggests that on GNU systems there is absolutely no difference whether a file was read in a text or binary mode. Same for Windows. I could not find any info on this for MacOS though.

25 Dec 2024

Background story on my backup drives

A couple of days ago my Seagate drive with the primary copy of all files suffered issues. Some folders could not be opened, while others completely disappeared. All this happened under Windows.

I have received a new 8 TB Seagate drive and now I am doing a complete copy of the old Expansion 8TB to this new Seagate. Old drive is exFAT. I’ve read on the internet that exFat is extremely prone to file loss when unplugged unexpectedly. Thus he new drive was formatted as APFS with Encryption. I will format the exFAT drive to APFS too, probably.

What do I expect from this tool

Bckup Analyzer (BAN)

Current status (probably somewhere in August 2024

After running “find + sha256sum” on both old and new snapshots, I got two files. The first file contains hashes and paths of the old snapshot, the second snapshot contains hashes and paths from the older snapshot.

Now it is time to analyze the snapshots. First thing to do was to check if there are hashes present in the old snapshot that are not there in the new shapshot. That would mean that some files were lost between the two snapshot operations of the same folder.

And yes, there was diff!

Most of the hashes belonged to Capture One sidecar files. I did some edits on my photos, between the first and the second snapshots, and that changed hashes of files.

Okay, well, I am not sure if there is any value in those old versions, but I will save them anyway.

Tasks

Implement proper command-line arguments

Copy the changed files to a new location

All paths contains a new line character \n

The input files are read line-by-line. For some reason, python’s readline() function returns a string with \n in it. Fix this behavior and also add a test for it.

Should the spaces in the file paths be escaped?

Example: 01234 /path/to/file\ with\ spaces

Rules

Hash, two spaces, file path

Lines coming from the input files must start with a hash, then have a single space, and the rest is considered a path. 01234 /path/to/file The lines are not allowed to start with a space: \ 01234 /path/to/file Or to have multiple spaces between the hash and the path: 01234 /path/to/file It is allowed, though, to have paths with spaces: 01234 /path/to/file with spaces