Corruption when nearly filling the pool (>75%) #11

carlesmateo · 2018-06-20T15:37:45Z

System information

Type	DRAID 7078 commit from May 2nd 2018 (and previous Jan 19th)
Distribution Name	RHEL
Distribution Version	7.4 (I compiled myself)
Linux Kernel	Default: 3.10.0-693.17.1.el7.x86_64 #1 SMP Sun Jan 14 10:36:03 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
Architecture	amd64
ZFS Version	DRAID (META is broken, is PR7078 May 2nd 2018)
SPL Version	0.7.9

Describe the problem you're observing

When the Dataset is nearly full, and you run scrub, corruption is detected.
It happens in physical servers and in VMs (Virtual Box).
It happens with and without ZIL.
It happens with DRAID1, 2, 3. (Parity 1, 2 or 3)
Seems to be related to/affecting the Metadata.

Describe how to reproduce the problem

Can be described easily, however the exact point on which you reach corruption depends on the geometry: drives size or partition size, size of the zvols...
I get to the same point by filling the Dataset locally, or by creating one or more zvols and:

formatting them with ext4, mounting locally, and filling them locally
sharing via iSCSI, formatting remotely, mounting from the iSCSI initiator, and filling through the network

Typical example:

Create a pool any DRAID config
For example:
I verified it fails 2x(8+3)+1 2x(8+2)+1 and 2x(8+1)+1
As I was using drives of 4TB, in order to fill them quickly enough, I create partitions of 10GB, and the pool is created with those partitions. I found the same problems with partitions 100GB and using full disks.
Create a zvol of 90% of the Available Data capacity of the pool
If the available space is 100GB, create a zvol of 90GB.
You can also create several zvols, for example 45%, 30%, 20% but the easiest way is to create a zvol at 90%.
Between total use of 80% and 96% you will hit corruption, but it varies each time depending on the size of the Volumes, etc...
Fill the volume/s
After filling one, run scrub.
Fill 10%, run scrub: OK
Fill 20%, run scrub: OK
...
You'll see that no errors reported until you start to use more than 75% of the Data space.
75% seems to be safe always in all the tests by now.
But near 80%, corruption can be found at any moment. Sometimes you don't hit it until getting to 92%, other times at 85%, etc...

Please note, that no Rebuild was done at this point, neither any drive has been OFFLINED/ONLINED and all the drives are healthy.
Also happens on VMs. I always check that the drives are big enough on the VMs, so at least I use 8GB virtual drives.

Please note that you don't need to use zvol or filling using iscsi. You can fill the drives much more faster filling the Dataset locally, by mounting it and just filling. I tried both. When using iSCSI I use always volblocksize=256KB

I fill the zvol or Datasets with this command:

dd bs=1M count=1024 if=/dev/zero of=/test-pool/file1
dd bs=1M count=1024 if=/dev/zero of=/test-pool/file2
dd bs=1M count=1024 if=/dev/zero of=/test-pool/file3
... (and so on)

Later I did more complex tests, and I SHA1 each of the files after writing, and the hash is valid, even when after I run scrub and get corruption. (that and the zpool status -v makes me thing that it is not the Data but the Metadata which gets corrupted)

#!/usr/bin/env bash
FILES_PATH="/mnt/zd0/"
FILES_INITIAL=1
# Fill up to the file number. 1024 files at 1GB = 1TB
FILES_NUMBER=1024
FILE_PREFIX="file"
DATE_NOW=`date '+%Y-%m-%d_%H-%M-%S'`

# Do not modify the order of the params, a number is added to output file later after $FILE_PREFIX
COMMAND_WRITE="dd bs=1M count=1024 if=/dev/zero of=${FILES_PATH_TEST}$FILE_PREFIX"

SHA1SUM_OLD=""
SHA1_HASH=""

for ((COUNT=$FILES_INITIAL; COUNT<=$FILES_NUMBER; COUNT++)); do
    eval "${COMMAND_WRITE}${COUNT}"
    echo "Calculating HASH SHA-1"
    SHA1_HASH=`sha1sum ${FILES_PATH_TEST}${FILE_PREFIX}${COUNT} | awk '{ print $1; }'`
    echo ${SHA1_HASH}
    # Rename the file to include the HASH so later we can validate that we get the same
    mv ${FILES_PATH_TEST}${FILE_PREFIX}${COUNT} ${FILES_PATH_TEST}${FILE_PREFIX}${COUNT}-SHA1-${SHA1_HASH}
    if [ -z "$SHA1SUM_OLD" ]; then
        SHA1SUM_OLD=${SHA1_HASH}
    fi
    # We compare the HASH to the previous value as the DD command is the same and must generate always the same HASH
    if [ "$SHA1_HASH" != "$SHA1SUM_OLD" ]; then
        echo "Attention! HASH does not match. Probable Data Corruption"
        exit 1
    fi
done

Capture of the error with zpool status -v with May 2nd.
Please note that DRAID Rebuild was launched automatically when detected the errors (I believe was ZED)
zpool status -v

  pool: pool1
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 6.24G in 0 days 00:03:25 with 33 errors on Wed Jun 20 11:14:59 2018
config:

    NAME                                         STATE     READ WRITE CKSUM
    pool1                                        DEGRADED     0     0    51
      draid1-0                                   DEGRADED     0     0   102
        ata-VBOX_HARDDISK_VB07113f71-dc523698    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB256c4414-28ff1e22    DEGRADED     0     0    16  too many errors
        ata-VBOX_HARDDISK_VB2f5ba582-0c7d474a    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB37cb1bc9-82a98a51    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB4a2107a4-3884cf4b    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB4ced98dc-1d153093    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB4e553f74-564ca3af    DEGRADED     0     0    20  too many errors
        ata-VBOX_HARDDISK_VB96502d1a-83758fc9    DEGRADED     0     0    16  too many errors
        spare-8                                  DEGRADED     0     0     6
          ata-VBOX_HARDDISK_VB9b83d4d6-515d88e1  DEGRADED     0     0    20  too many errors
          %draid1-0-s0                           ONLINE       0     0     6
        ata-VBOX_HARDDISK_VBa3c751b7-2618e79a    ONLINE       0     0     0
    spares
      %draid1-0-s0                               INUSE     currently in use

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x21>
        <metadata>:<0x35>
        <metadata>:<0x40>
        <metadata>:<0x4d>
        <metadata>:<0x4e>
        pool1:<0x0>
        pool1:<0x1>
        pool1:<0x23>
        pool1:<0x24>
        pool1/v55g:<0x1>

Here no corruption, but unrecoverable error.
zpool status -v

zpool status
  pool: pool1
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: rebuilt 6.13G in 0 days 00:04:03 with 0 errors on Wed Jun 20 10:46:54 2018
config:

    NAME                                         STATE     READ WRITE CKSUM
    pool1                                        DEGRADED     0     0    18
      draid1-0                                   DEGRADED     0     0    36
        ata-VBOX_HARDDISK_VB07113f71-dc523698    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB256c4414-28ff1e22    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB2f5ba582-0c7d474a    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB37cb1bc9-82a98a51    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB4a2107a4-3884cf4b    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB4ced98dc-1d153093    ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB4e553f74-564ca3af    DEGRADED     0     0    14  too many errors
        ata-VBOX_HARDDISK_VB96502d1a-83758fc9    ONLINE       0     0     0
        spare-8                                  DEGRADED     0     0     0
          ata-VBOX_HARDDISK_VB9b83d4d6-515d88e1  DEGRADED     0     0    14  too many errors
          %draid1-0-s0                           ONLINE       0     0     0
        ata-VBOX_HARDDISK_VBa3c751b7-2618e79a    ONLINE       0     0     0
    spares
      %draid1-0-s0                               INUSE     currently in use

Capture of the error with zpool status -v with Jan 19th.

  pool: test-pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 0 days 00:01:29 with 63 errors on Wed Jun 20 06:01:21 2018
config:

    NAME                              STATE     READ WRITE CKSUM
    test-pool                         ONLINE       0     0    67
      draid1-0                        ONLINE       0     0   142
        wwn-0x5000c500580a6223-part1  ONLINE       0     0     0
        wwn-0x5000c500580a71b3-part1  ONLINE       0     0     0
        wwn-0x5000c500580a752f-part1  ONLINE       0     0     0
        wwn-0x5000c500580a7543-part1  ONLINE       0     0    36
        wwn-0x5000c500580a766f-part1  ONLINE       0     0     0
        wwn-0x5000c500580ac15f-part1  ONLINE       0     0     0
        wwn-0x5000c500580ac7fb-part1  ONLINE       0     0     0
        wwn-0x5000c500580ac893-part1  ONLINE       0     0     0
        wwn-0x5000c500580c161b-part1  ONLINE       0     0     0
        wwn-0x5000c500580c934f-part1  ONLINE       0     0     0
        wwn-0x5000c500580d316f-part1  ONLINE       0     0     0
        wwn-0x5000c500580e166f-part1  ONLINE       0     0     0
        wwn-0x5000c500580ff12b-part1  ONLINE       0     0     0
        wwn-0x5000c50058108c03-part1  ONLINE       0     0    72
        wwn-0x5000c5005810962f-part1  ONLINE       0     0    36
        wwn-0x5000c50058109eeb-part1  ONLINE       0     0     0
        wwn-0x5000c5005810e847-part1  ONLINE       0     0     0
        wwn-0x5000c5005810e97f-part1  ONLINE       0     0     0
        wwn-0x5000c5005810ebdf-part1  ONLINE       0     0     0
    spares
      $draid1-0-s0                    AVAIL

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x100>
        <metadata>:<0x33>
        <metadata>:<0x34>
        <metadata>:<0x35>
        <metadata>:<0x3f>
        <metadata>:<0x80>
        <metadata>:<0x95>
        test-pool:<0x0>
        test-pool:<0x1>
        test-pool:<0x23>
        test-pool:<0x24>
        <0x183>:<0x1>

Include any warning/errors/backtraces from the system logs

zfs get all

ZFS get all for parity 1 pool:
NAME       PROPERTY              VALUE                  SOURCE
test-pool  type                  filesystem             -
test-pool  creation              Wed Jun 20  5:34 2018  -
test-pool  used                  138G                   -
test-pool  available             16.8G                  -
test-pool  referenced            64.9K                  -
test-pool  compressratio         1.00x                  -
test-pool  mounted               yes                    -
test-pool  quota                 none                   default
test-pool  reservation           none                   default
test-pool  recordsize            128K                   default
test-pool  mountpoint            /test-pool             default
test-pool  sharenfs              off                    default
test-pool  checksum              on                     default
test-pool  compression           off                    default
test-pool  atime                 on                     default
test-pool  devices               on                     default
test-pool  exec                  on                     default
test-pool  setuid                on                     default
test-pool  readonly              off                    default
test-pool  zoned                 off                    default
test-pool  snapdir               hidden                 default
test-pool  aclinherit            restricted             default
test-pool  createtxg             1                      -
test-pool  canmount              on                     default
test-pool  xattr                 on                     default
test-pool  copies                1                      default
test-pool  vscan                 off                    default
test-pool  nbmand                off                    default
test-pool  sharesmb              off                    default
test-pool  refquota              none                   default
test-pool  refreservation        none                   default
test-pool  guid                  14986709983637829338   -
test-pool  primarycache          all                    default
test-pool  secondarycache        all                    default
test-pool  usedbysnapshots       0B                     -
test-pool  usedbydataset         64.9K                  -
test-pool  usedbychildren        138G                   -
test-pool  usedbyrefreservation  0B                     -
test-pool  logbias               latency                default
test-pool  dedup                 off                    default
test-pool  mlslabel              none                   default
test-pool  sync                  standard               default
test-pool  dnodesize             legacy                 default
test-pool  refcompressratio      1.00x                  -
test-pool  written               64.9K                  -
test-pool  logicalused           137G                   -
test-pool  logicalreferenced     12K                    -
test-pool  volmode               default                default
test-pool  filesystem_limit      none                   default
test-pool  snapshot_limit        none                   default
test-pool  filesystem_count      none                   default
test-pool  snapshot_count        none                   default
test-pool  snapdev               hidden                 default
test-pool  acltype               off                    default
test-pool  context               none                   default
test-pool  fscontext             none                   default
test-pool  defcontext            none                   default
test-pool  rootcontext           none                   default
test-pool  relatime              off                    default
test-pool  redundant_metadata    all                    default
test-pool  overlay               off                    default
test-pool  encryption            off                    default
test-pool  keylocation           none                   default
test-pool  keyformat             none                   default
test-pool  pbkdf2iters           0                      default

The text was updated successfully, but these errors were encountered:

pierreyves-lebrun · 2018-11-30T15:21:38Z

Did anyone manage to investigate or replicate that problem?

richardelling · 2018-11-30T16:23:21Z

root cause is known and the current draid_rebase branch is passing tests

richardelling · 2018-11-30T16:39:34Z

for more info here, the branch I referred to is:
https://github.com/don-brady/zfs/tree/draid_rebase

this branch has some other issues being worked now

pierreyves-lebrun · 2018-12-01T02:30:45Z

I see, thanks for the heads-up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corruption when nearly filling the pool (>75%) #11

Corruption when nearly filling the pool (>75%) #11

carlesmateo commented Jun 20, 2018 •

edited

Loading

pierreyves-lebrun commented Nov 30, 2018

richardelling commented Nov 30, 2018

richardelling commented Nov 30, 2018

pierreyves-lebrun commented Dec 1, 2018

Corruption when nearly filling the pool (>75%) #11

Corruption when nearly filling the pool (>75%) #11

Comments

carlesmateo commented Jun 20, 2018 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

pierreyves-lebrun commented Nov 30, 2018

richardelling commented Nov 30, 2018

richardelling commented Nov 30, 2018

pierreyves-lebrun commented Dec 1, 2018

carlesmateo commented Jun 20, 2018 •

edited

Loading