find . -type f -print0 | xargs -0 sha256sum | tee ~/recursive-sha256sum.txt
find . -type f -print0 | xargs -0 sha256sum | tee ~/4Tera-recursive-sha256sum.txt
find . -type f -print0 | xargs -0 sha256sum | tee ~/8Tera-recursive-sha256sum.txt
Then run `ban` on the generated files:
python3 ban.py ~/4Tera-recursive-sha256sum.txt ~/8Tera-recursive-sha256sum.txt
I have a folder A and I just copied it to a backup location B. The two folders should be absolutely the same, right? How this tool should check this:
- are all files that are present in A also present in B?
- are all files that are present in B also present in A?
- are all the contents of all the files the same?
I have a source folder. At some point in the past I created a backup A of the source folder and placed it on an external drive. As time passed, I added new files to my source folder and modified some of the existing files. Then I make another backup B of the source folder and place it onto another external drive.
Now I wish to see what changed in between these backups. Comparing the two backups helps to detect corruption on either of the external backup drives or on the source drive. For example, after the backup A was done, cosmic rays corrupted a random file on the source drive. Comparing the backup A and backup B shows a difference in that file. Once I see that the file changed, I can inspect and hopefully detect the corruption.
I just plugged in the primary Seagate drive into Windows and Windows seems to have repaired it! WTF man. This is the drive that triggered all that panicky backups a month ago.
It looks like I’ve used `sha256sum` always in text mode. But the internet suggests that digital preservations endeavours should always use the binary reading mode. Sources:
But, the Stack Overflow suggests that on GNU systems there is absolutely no difference whether a file was read in a text or binary mode. Same for Windows. I could not find any info on this for MacOS though.
A couple of days ago my Seagate drive with the primary copy of all files suffered issues. Some folders could not be opened, while others completely disappeared. All this happened under Windows.
I have received a new 8 TB Seagate drive and now I am doing a complete copy of the old Expansion 8TB to this new Seagate. Old drive is exFAT. I’ve read on the internet that exFat is extremely prone to file loss when unplugged unexpectedly. Thus he new drive was formatted as APFS with Encryption. I will format the exFAT drive to APFS too, probably.
After running “find + sha256sum” on both old and new snapshots, I got two files. The first file contains hashes and paths of the old snapshot, the second snapshot contains hashes and paths from the older snapshot.
Now it is time to analyze the snapshots. First thing to do was to check if there are hashes present in the old snapshot that are not there in the new shapshot. That would mean that some files were lost between the two snapshot operations of the same folder.
And yes, there was diff!
Most of the hashes belonged to Capture One sidecar files. I did some edits on my photos, between the first and the second snapshots, and that changed hashes of files.
Okay, well, I am not sure if there is any value in those old versions, but I will save them anyway.
The input files are read line-by-line.
For some reason, python’s readline()
function returns a string with \n
in it.
Fix this behavior and also add a test for it.
Example: 01234 /path/to/file\ with\ spaces
Lines coming from the input files must start with a hash, then have a single space, and the rest is considered a path.
01234 /path/to/file
The lines are not allowed to start with a space:
\ 01234 /path/to/file
Or to have multiple spaces between the hash and the path:
01234 /path/to/file
It is allowed, though, to have paths with spaces:
01234 /path/to/file with spaces