Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alternate data streams on NTFS & ReFS #2633

Open
raphidae opened this issue Jan 30, 2025 · 0 comments
Open

Support alternate data streams on NTFS & ReFS #2633

raphidae opened this issue Jan 30, 2025 · 0 comments

Comments

@raphidae
Copy link

Dear developers,

First: thanks for a great tool!

As it happens, I use alternate data streams (ADS) on NTFS and ReFS to store file checksums (using RapidCRC) to guard again bitrot.

When investigating a pretty large discrepancy in free disk space between physically and logically identical disks, where WinMerge reported that the contents were definitely fully identical, I figured out that, apparently, WinMerge will currently ignore any ADS of files, regardless of what type of comparison is used.

When searching about this, I found a possibly relevant changelog: "WinMerge 2.16.29 - 2023-03-21", which refers to the fixing of a bug when an ADS is passed on the cli. So, the concept is clearly not unknown to the developers :)

Is there an option I've missed to turn comparison of the entire file, including any ADS, on?

If not, can support for this be implemented? ADS are pretty well documented, and it seems to me this would mostly require a change in how files to be compared are enumerated, not in the code that does the actual comparison.

It's been decades since I've wrote any code, but I imagine it entails:

  1. checking whether a file resides on a file system that supports alternate data streams (eg. NTFS/ReFS), something that probably only has to be done once when enumerating the drives, and then cached with the rest of the drive information, and,

  2. when enumerating files, adding a check for alternate data streams, enumerating those (or perhaps just always enumerate ADS, depending on how expensive this is resource-wise), and then enqueueing the alternate streams as seperate files internally.

Given the changelog entry, it seems the comparison code is already perfectly able to handle them right now.

Currently, WinMerge will definitively state files are identical, even if one of them could have gigabytes stored in an alternate stream that the other "identical" file doesn't.

Additionally, it will also determine files tagged as "from the Internet" and those that aren't to be identical. This is because windows/explorer stores this information in the "Zone.Identifier" ADS (it will also store some thumbnail information in an ADS, too).

Microsoft Word incidentally also uses the "Zone.Identifier" ADS to make security context judgements about a file being opened, so the difference isn't trivial.

Just in general, when WinMerge is used to make sure a backup is identical to the source (as in my use-case), and ADS are used in any way, WinMerge currently will give a false sense that everything is indeed fully identical, when I fact it isn't.

Thank you for your time. I'm willing to alpha/beta test if that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants