Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect bad block cloning? #15554

Closed
mabod opened this issue Nov 22, 2023 · 24 comments
Closed

How to detect bad block cloning? #15554

mabod opened this issue Nov 22, 2023 · 24 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@mabod
Copy link

mabod commented Nov 22, 2023

zfs 2.2.0 has a nasty block cloning bug which corrupts data: #15526
This has been fixed with zfs 2.2.1.

How can I detect if I have been hit by the block cloning bug? Can I somehow check the filesystem to find corrupted data that needs to be restored?

@mabod mabod added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 22, 2023
@rincebrain
Copy link
Contributor

Not as well as you might hope.

(All that follows is to the best of my current understanding; since it's not fixed in master atm, some of this may be incorrect, though I would be surprised.)

The problem there is similar to #11900, where the actual write wasn't the problem, it was the data read in the moment and the application's acting on it.

Specifically, there is a window where something being cloned wasn't properly marked as dirty right after cloning, so trying to use SEEK_HOLE/SEEK_DATA or equivalent on the thing that had just been cloned could incorrectly decide that swathes of the data were sparse, and write the destination accordingly.

So, for example, if you were doing cp --reflink=always file1.o file2.o; cp --reflink=never file2.o file3.o;, then in the moment, the second cp might determine that swathes of file2.o were \0, incorrectly, if it were doing SEEK_HOLE/SEEK_DATA to find things, and then file3.o would, subsequently, have large swathes of \0 that, from ZFS's perspective, were all correctly written, since that's what the application asked it to write.

So you could basically go looking for unexpected regions of \0, but that obviously might have a nontrivial false positive rate.

@mabod
Copy link
Author

mabod commented Nov 22, 2023

Is block cloning only happening when I explizitly ask for it with things like "cp --reflink"? Or can it also happen with the regular cp src dest or rsync -a ... command during normal operations?

@rincebrain
Copy link
Contributor

rsync won't ever do it afaik

cp as of GNU coreutils 9 defaults to =auto, which uses copy_file_range, which if block cloning is enabled will try it, yes.

@mabod
Copy link
Author

mabod commented Nov 22, 2023

Isnt there a way to find all cloned files/blocks on a filesystem? I assume that must be possible but I do not seem to find anything with google.

@rincebrain
Copy link
Contributor

Again, that won't help you here.

The problem isn't the cloned blocks. The problem is if you tried to do a normal read+write of them too soon after being written, the result won't be cloned but will be mangled.

@aNDy-Squirrel
Copy link

Can we say that the pool is safe if the block cloning feature has not been enabled after update to zfs 2.2.0? That is: not done a 'zpool upgrade', and 'zpool get all' gives:
...
feature@block_cloning disabled local
...

@rincebrain
Copy link
Contributor

That particular bug cannot arise if block_cloning is disabled, correct.

@classabbyamp
Copy link

Isnt there a way to find all cloned files/blocks on a filesystem?

building zdb with the patch in this PR will allow you to use the linked script to list files that are bcloned, but it won't say which ones are corrupted: #15541 (comment)

@IvanVolosyuk
Copy link
Contributor

Isnt there a way to find all cloned files/blocks on a filesystem?

building zdb with the patch in this PR will allow you to use the linked script to list files that are bcloned, but it won't say which ones are corrupted: #15541 (comment)

And corrupted files may not be cloned at all as @rincebrain pointed out.

@broizter
Copy link

It wont tell you which files are corrupted but someone said zpool get all tank | grep bclone can show you how much data is corrupted.

@mabod
Copy link
Author

mabod commented Nov 22, 2023

Isnt there a way to find all cloned files/blocks on a filesystem?

building zdb with the patch in this PR will allow you to use the linked script to list files that are bcloned, but it won't say which ones are corrupted: #15541 (comment)

Thank you very much. That helped me a lot. At least I know now that all files which have ever be cloned are just source code files which I downloaded from somewhere into my src folder. Nothing important! I did not notice any corruption so far and now I know that no important data got cloned anyways.

@rincebrain
Copy link
Contributor

rincebrain commented Nov 22, 2023

It wont tell you which files are corrupted but someone said zpool get all tank | grep bclone can show you how much data is corrupted.

That's just not true, as I explained earlier in the thread.

Isnt there a way to find all cloned files/blocks on a filesystem?

building zdb with the patch in this PR will allow you to use the linked script to list files that are bcloned, but it won't say which ones are corrupted: #15541 (comment)

Thank you very much. That helped me a lot. At least I know now that all files which have ever be cloned are just source code files which I downloaded from somewhere into my src folder. Nothing important! I did not notice any corruption so far and now I know that no important data got cloned anyways.

No, that just tells you the only ones with clones now are ...

@broizter
Copy link

It wont tell you which files are corrupted but someone said zpool get all tank | grep bclone can show you how much data is corrupted.

That's just not true, as I explained earlier in the thread.

You're totally right, brain fart from me there. But it should tell you how much data is IN RISK of having been corrupted by this particular bug right?

@FL140
Copy link

FL140 commented Nov 22, 2023

@rincebrain First thank's for trying to fix this issue as good as possible from what I saw in all the threads regarding this issue. I try to understand the exact impact of the bug, as I am hit by this badly, can you please confirm or clarify the following questions that came up after reading through the threads relating to this bug.

  1. If I understand right, only data that has been copied in the first place can be effected, correct?.
  2. Only the copy (b) of the original file (a) and copies of (b) can be effected, never (a), correct?
  3. With the tool provided here I can see all files that CURRENTLY have cloned blocks (all files where neither (a) or (b) or both got deleted), source (a) and destination (b), so if I compare file (a) with file (b) I can verify that (b) is either OK or corrupted, correct?
  4. So if I know that no relevant files were deleted (e.g. relevant data in a users home directory, by checking deleted files against a snapshot prior the pool feature was enabled) I can identify that way the relevant corrupted files?
  5. Is there any other option where existing data prior enabling the feature on the pool can have been corrupted by this bug?
  6. So are all my snapshots prior the enabling of the feature on the pool safe and for sure not corrupted?
  7. When a file that was referenced in the cloned block information got deleted in the current file system (only one copy left), but is referenced in any snapshot of the pool it should still show up with the tool (because the cloned block is there in the snapshot), correct?

Sorry, if this should be obvious, but I try to keep any further impact of this as low as possible on the data I am responsible for so better ask twice than further corrupt data here. (btw. This is the worst data corruption I ran into in the last 20 years of IT. Silent data corruption (e.g. bad RAM, no ECC possible) is really bad and avoidance of it is exactly the major reason I chose ZFS in the first place.)

@FL140
Copy link

FL140 commented Nov 22, 2023

It wont tell you which files are corrupted but someone said zpool get all tank | grep bclone can show you how much data is corrupted.

That's just not true, as I explained earlier in the thread.

But it should tell you how much data is IN RISK of having been corrupted by this particular bug right?

If I understand it correctly it gives you the information of the potential CURRENTLY detectable corrupted files. But if e.g. the source of the copy got deleted in the past there no longer is a cloned block entry. So this would not show up. (I hope this information is correct as am trying to understand the impacts myself.)

@rincebrain
Copy link
Contributor

  1. Keeping in mind that GNU mv will do the usual cp-then-rm dance if you cross datasets, and thus you are doing a copy, yes.
  2. It would only be copies of (b), not the original (b), affected.
  3. I assume you mean the patch to zdb that shows the BRT entries. BRT entries would exist for things where (b) still exists, I believe, not for things where there's one copy left, like if you did cp --reflink=always a b;rm b;, since there's no point in keeping it there for copies=1 any more.
  4. I don't think that example is sufficient to prove no deletions happened - e.g. if you did cp --reflink=always a b;cp --reflink=never b c;rm -f b; where there's no snapshot between a being created and rm -f b, you don't know that b ever existed from just two snapshots on either side.
  5. Nothing that existed prior to the feature being enabled can have been mangled, this can only apply to things modified/created afterward.
  6. Should be, snapshots are immutable.
  7. (I believe) If there is only one copy of the data left referenced anywhere, then I do not believe it would be in the BRT any longer, because it's basically just a refcount of additional copies before being able to delete things, and things at copies=1 don't need to be in there, you can just delete them. If a snapshot is still referencing other copies of it, it would be in there, since it hasn't been "freed" yet, so it wouldn't have decremented.

@tonyhutter
Copy link
Contributor

Just wanted to cross post that I have a reproducer script here: #15526 (comment)

@FL140
Copy link

FL140 commented Nov 22, 2023

2. It would only be copies of (b), not the original (b), affected.

@rincebrain thank's for the comprehensive clarification!

2: So that means a and b are always correct and only copies of b are potentially corrupted!? or is this a typo and "original (a)" was ment?

4: I understand that restriction, but with at least 1 snapshot available for each day since feature activation this is an acceptable corner case (I hope) at least for user data. At the end this is damage control, we need to live with this anyways a complete rollback on the affected systems here is not acceptable either.

@rincebrain
Copy link
Contributor

Oh, no, everything is awful forever, it reproduces on 2.1.x too, which means it's probably not a block cloning bug, and is just another case of the same kind of issue as #11900 elsewhere. Great.

@RichardBelzer
Copy link

You can use the script posted here to help determine whether you have files that are corrupted as a result of this bug.

@rincebrain
Copy link
Contributor

That script reports any files with regions of \0s that are multiples of 4k. Not all files with strings of \0s are mangled.

@mike-zueff
Copy link

How to detect bad block cloning?

https://github.com/0x5c/zfs-bclonecheck

@0x5c
Copy link

0x5c commented Nov 26, 2023

How to detect bad block cloning?

https://github.com/0x5c/zfs-bclonecheck

As much as I wanted that script to be comprehensive, it was written before a couple things were clarified or discovered

  1. While this script will successfully list all instances of files in the BRT (which backs the block cloning feature), bcloned files where only one/none instance exists get removed from the BRT.
  2. It was later discovered that the bug is preexisting and only made easier to hit by block cloning.

It's also worth noting that only in some specific circumstances that bcloned files became corrupted, so even on a system that has hit the bug from block cloning, not all bcloned files are necessarily corrupted.

@mabod
Copy link
Author

mabod commented Oct 15, 2024

It is all said in this thread.

@mabod mabod closed this as completed Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests