Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draidcfg -p 2 -d 3 -s 1 -n 6 locks on small disks (less than 1200mb) #7

Open
gmelikov opened this issue Aug 8, 2017 · 1 comment
Open

Comments

@gmelikov
Copy link

gmelikov commented Aug 8, 2017

System information

Type Version/Name
Distribution Name Centos
Distribution Version 7
Linux Kernel 3.10.0-514.26.1.el7.x86_64
Architecture x64
ZFS Version 6e39308
SPL Version 0.7.0-rc4_5_g7a35f2b

Describe the problem you're observing

All zfs related commands freezes (and we cannot reboot system by reboot command) then create 6 disks draid2 pool on disks less than 1200mb.

If we try it on 1300mb disks then it works as supposed.

Describe how to reproduce the problem

rm -rf /var/tmp/test
mkdir /var/tmp/test
cd /var/tmp/test
for fn in {1..6}; do dd if=/dev/zero of=/var/tmp/test/$fn.img bs=1M count=1200; done
modprobe zfs
draidcfg -p 2 -d 3 -s 1 -n 6 /var/tmp/test/draid.nvl
zpool create -f testpool draid2 cfg=/var/tmp/test/draid.nvl /var/tmp/test/{1..6}.img
#command freezes, see logs below

Include any warning/errors/backtraces from the system logs

[  200.799473] spl: loading out-of-tree module taints kernel.
[  200.799564] spl: module verification failed: signature and/or required key missing - tainting kernel
[  200.802464] SPL: Loaded module v0.7.0-rc3_8_g481762f (DEBUG mode)
[  200.803106] znvpair: module license 'CDDL' taints kernel.
[  200.803108] Disabling lock debugging due to kernel taint
[  202.396449] ZFS: Loaded module v0.7.0-rc3, ZFS pool version 5000, ZFS filesystem version 5
[  203.562942]  sdb: sdb1 sdb9
[  203.565623]  sdb: sdb1 sdb9
[  203.567864]  sdb: sdb1 sdb9
[  203.762573]  sdc: sdc1 sdc9
[  203.770463]  sdc: sdc1 sdc9
[  203.772247]  sdc: sdc1 sdc9
[  203.975979]  sdd: sdd1 sdd9
[  203.983113]  sdd: sdd1 sdd9
[  203.985945]  sdd: sdd1 sdd9
[  204.195823]  sde: sde1 sde9
[  204.203310]  sde: sde1 sde9
[  204.207913]  sde: sde1 sde9
[  204.399645]  sdf: sdf1 sdf9
[  204.408676]  sdf: sdf1 sdf9
[  204.571065]  sdg: sdg1 sdg9
[  204.574878]  sdg: sdg1 sdg9
[  204.579216]  sdg: sdg1 sdg9
[  204.803557] SPL: using hostid 0x00000000
[  204.858356] WARNING: Pool 'mgsmdt-001' has encountered an uncorrectable I/O failure and has been suspended.

[  360.437443] INFO: task zpool:3293 blocked for more than 120 seconds.
[  360.437485] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.437510] zpool           D ffff880076f00a58     0  3293   3250 0x00000080
[  360.437514]  ffff8800780c7c78 0000000000000086 ffff880078053ec0 ffff8800780c7fd8
[  360.437516]  ffff8800780c7fd8 ffff8800780c7fd8 ffff880078053ec0 ffff880076f00ae0
[  360.437518]  ffff880076f00a30 ffff880076f00ae8 0000000000000000 ffff880076f00a58
[  360.437520] Call Trace:
[  360.437533]  [<ffffffff8168c7f9>] schedule+0x29/0x70
[  360.437545]  [<ffffffffa065160d>] cv_wait_common+0x14d/0x2a0 [spl]
[  360.437550]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
[  360.437554]  [<ffffffffa0651775>] __cv_wait+0x15/0x20 [spl]
[  360.437594]  [<ffffffffa07a3f7f>] txg_wait_synced+0xdf/0x120 [zfs]
[  360.437617]  [<ffffffffa079706d>] spa_create+0x6ad/0x9e0 [zfs]
[  360.437641]  [<ffffffffa07d9442>] zfs_ioc_pool_create+0x152/0x270 [zfs]
[  360.437644]  [<ffffffff811dd875>] ? __kmalloc+0x55/0x240
[  360.437668]  [<ffffffffa07d6ac6>] zfsdev_ioctl+0x506/0x550 [zfs]
[  360.437671]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[  360.437674]  [<ffffffff81692ce1>] ? __do_page_fault+0x171/0x450
[  360.437675]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[  360.437677]  [<ffffffff81697809>] system_call_fastpath+0x16/0x1b
[  360.437685] INFO: task txg_sync:3484 blocked for more than 120 seconds.
[  360.437707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.437731] txg_sync        D ffff880076b983a8     0  3484      2 0x00000080
[  360.437733]  ffff88007736bae0 0000000000000046 ffff880076b60000 ffff88007736bfd8
[  360.437735]  ffff88007736bfd8 ffff88007736bfd8 ffff880076b60000 ffff88007aa56c40
[  360.437736]  0000000000000000 7fffffffffffffff 0000000000000001 ffff880076b983a8
[  360.437737] Call Trace:
[  360.437740]  [<ffffffff8168c7f9>] schedule+0x29/0x70
[  360.437742]  [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0
[  360.437766]  [<ffffffffa07f626f>] ? zio_taskq_dispatch+0x8f/0xa0 [zfs]
[  360.437789]  [<ffffffffa07f62b2>] ? zio_issue_async+0x12/0x20 [zfs]
[  360.437812]  [<ffffffffa07fa97c>] ? zio_nowait+0xbc/0x150 [zfs]
[  360.437814]  [<ffffffff8168bd9e>] io_schedule_timeout+0xae/0x130
[  360.437816]  [<ffffffff810b1816>] ? prepare_to_wait_exclusive+0x56/0x90
[  360.437818]  [<ffffffff8168be38>] io_schedule+0x18/0x20
[  360.437822]  [<ffffffffa06515ab>] cv_wait_common+0xeb/0x2a0 [spl]
[  360.437824]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
[  360.437828]  [<ffffffffa06517b8>] __cv_wait_io+0x18/0x20 [spl]
[  360.437851]  [<ffffffffa07fa29b>] zio_wait+0xfb/0x190 [zfs]
[  360.437871]  [<ffffffffa07757ef>] dsl_pool_sync+0xbf/0x440 [zfs]
[  360.437893]  [<ffffffffa0791af7>] spa_sync+0x417/0xda0 [zfs]
[  360.437896]  [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20
[  360.437933]  [<ffffffffa07a4ea8>] txg_sync_thread+0x2d8/0x490 [zfs]
[  360.438037]  [<ffffffffa07a4bd0>] ? txg_init+0x280/0x280 [zfs]
[  360.438042]  [<ffffffffa064a73a>] thread_generic_wrapper+0x7a/0xc0 [spl]
[  360.438045]  [<ffffffffa064a6c0>] ? __thread_exit+0x20/0x20 [spl]
[  360.438048]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[  360.438050]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[  360.438051]  [<ffffffff81697758>] ret_from_fork+0x58/0x90
[  360.438053]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

stracedraid6.log.zip

@thegreatgazoo
Copy link
Owner

This is likely an issue we already fixed in the internal dev repo. The current github version hard codes every 1 out of 20 metaslabs to be a mirrored metaslab (for small blocks). When the drive is too small, there might be a deadlock at pool creation because there's less than 20 metaslabs.

With openzfs#5182 managing the mirrored metaslabs, it's more dynamic and smarter as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants