[Bug 808132] New: multipathd does not add active/good/ok paths back to dm-multipath devices during failback
https://bugzilla.novell.com/show_bug.cgi?id=808132 https://bugzilla.novell.com/show_bug.cgi?id=808132#c0 Summary: multipathd does not add active/good/ok paths back to dm-multipath devices during failback Classification: openSUSE Product: openSUSE 11.4 Version: Factory Platform: Other OS/Version: SLES 11 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: michaelc@cs.wisc.edu QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=528752) --> (http://bugzilla.novell.com/attachment.cgi?id=528752) Full /var/log/messages and multipath -ll output. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130220 Firefox/17.0 With sles11 sp2 and multipath-tools-0.4.9-0.68.1, we hit a bug where we do a port down on some ports, and that goes ok. But when we bring the ports back, the paths are a little unstable at first so the paths might be returning errors. But eventually everything is ok. However, multipath is stuck in a state where it does not seem to be sending path testing IO. At this time, READ/WRITE block/fs IO to the dm devices hang, and also SG IO to the dm devices hang. Here is multipath -ll for one of the dm devices: mpathc (23761343133383032) dm-2 FUSIONIO,ION LUN size=56G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw |-+- policy='queue-length 0' prio=80 status=enabled | |- 3:0:1:2 sdl 8:176 failed ready running | `- 3:0:0:2 sdd 8:48 failed ready running `-+- policy='queue-length 0' prio=50 status=enabled |- 4:0:1:2 sdab 65:176 failed ready running `- 4:0:0:2 sdt 65:48 failed ready running sdl and sdd are in the alua unavailable path state, but sdab and sdt are ok to use. If I manually send IO it works ok. ionr1c126:~ # sg_dd if=/dev/sdab of=/dev/null blk_sgio=1 bs=4096 count=1 1+0 records in 1+0 records out ionr1c126:~ # sg_dd if=/dev/sdt of=/dev/null blk_sgio=1 bs=4096 count=1 1+0 records in 1+0 records out ionr1c126:~ # sg_turs /dev/sdab ionr1c126:~ # sg_turs /dev/sdt multipath.conf: devices { device { vendor "FUSIONIO" features "3 queue_if_no_path pg_init_retries 50" hardware_handler "1 alua" path_grouping_policy group_by_prio path_selector "queue-length 0" failback immediate path_checker tur prio alua fast_io_fail_tmo 15 dev_loss_tmo 60 # we are using the version of multipathd that overrides this and sets this to the max when using queue_if_no_path } } We have a bunch of stuck blkids 26933 ? 00:00:00 blkid 26990 ? 00:00:00 blkid 27006 ? 00:00:00 blkid 27015 ? 00:00:00 blkid 27016 ? 00:00:00 blkid 27047 ? 00:00:13 blkid 27414 ? 00:00:13 blkid 27431 ? 00:00:13 blkid 27452 ? 00:00:13 blkid 27454 ? 00:00:13 blkid 27463 ? 00:00:13 blkid 27590 ? 00:00:13 blkid 27598 ? 00:00:13 blkid 27619 ? 00:00:13 blkid 27620 ? 00:00:13 blkid 27627 ? 00:00:13 blkid 27722 ? 00:00:13 blkid 27736 ? 00:00:13 blkid 27740 ? 00:00:13 blkid Also, I am not sure if it is related, but we also see a lit of udev errors around this time: Mar 4 12:26:10 ionr1c126 udevd[438]: worker [27641] unexpectedly returned with status 0x0100 Mar 4 12:26:10 ionr1c126 udevd[438]: worker [27641] failed while handling '/devices/virtual/block/dm-2' Reproducible: Always Steps to Reproduce: 1. 2. 3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com