[Bug 805732] New: task blocked for more than 480 secondes
https://bugzilla.novell.com/show_bug.cgi?id=805732 https://bugzilla.novell.com/show_bug.cgi?id=805732#c0 Summary: task blocked for more than 480 secondes Classification: openSUSE Product: openSUSE 12.3 Version: RC 1 Platform: Other OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: aj@suse.com QAContact: qa-bugs@suse.de CC: rhafer@suse.com Found By: Product Management Blocker: --- Building an image via osc, osc suddenly hangs. ps shows: 29840 pts/19 Ss 0:00 \_ /bin/bash 29950 pts/19 S+ 0:29 | \_ /usr/bin/python /usr/bin/osc build images x86_64 30212 pts/19 S+ 0:00 | \_ sudo /usr/bin/build --root=/abuild/osc/buildroot_images-x86_64 -- 30213 pts/19 S+ 0:00 | \_ /bin/bash /usr/bin/build --root=/abuild/osc/buildroot_images- 30230 pts/19 S+ 0:00 | \_ build logging -e open(F,">>",$ARGV[0])||die("$ARGV[0]: $! 10639 pts/19 S+ 0:00 | \_ su -c cd /usr/src/packages/SOURCES && kiwi --create /usr/ 10640 pts/19 S+ 0:00 | \_ -bash -c cd /usr/src/packages/SOURCES && kiwi --creat 10663 pts/19 S+ 0:00 | \_ /usr/bin/perl /usr/sbin/kiwi --create /usr/src/pa 11072 pts/19 S+ 0:00 | \_ /bin/sh /usr/src/packages/KIWI-oem/boot-VMX.M 11073 pts/19 S+ 0:01 | \_ /usr/bin/zypper --non-interactive --no-gp 11870 pts/19 S+ 0:00 | \_ rpm --root /usr/src/packages/KIWI-oem 11871 pts/19 S+ 0:00 | \_ /bin/sh /var/tmp/rpm-tmp.lfc5dC 1 11902 pts/19 D+ 0:00 | \_ /bin/sh /var/tmp/rpm-tmp.lfc5 and dmesg reports: [326894.045840] INFO: task sh:11902 blocked for more than 480 seconds. [326894.045844] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [326894.045846] sh D ffff88014dcb32c0 0 11902 11871 0x00000000 [326894.045851] ffff88003580dc78 0000000000000086 ffff880002844680 ffff88003580dfd8 [326894.045855] ffff88003580dfd8 ffff88003580dfd8 ffff88014783c4c0 ffff880002844680 [326894.045859] 0000000000000246 ffff880146858800 0000000000000000 0000000000000001 [326894.045862] Call Trace: [326894.045877] [<ffffffff811715cb>] __sb_start_write+0xcb/0x110 [326894.045886] [<ffffffff8118d24b>] mnt_want_write+0x1b/0x50 [326894.045891] [<ffffffff8117d617>] do_last+0xa47/0xed0 [326894.045897] [<ffffffff8117db63>] path_openat+0xc3/0x4c0 [326894.045902] [<ffffffff8117e804>] do_filp_open+0x44/0xb0 [326894.045906] [<ffffffff8116eb83>] do_sys_open+0xf3/0x1e0 [326894.045911] [<ffffffff8159f22d>] system_call_fastpath+0x1a/0x1f [326894.045930] [<00007fa0464a1fe0>] 0x7fa0464a1fdf [327372.659927] INFO: task sh:11902 blocked for more than 480 seconds. [327372.659931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [327372.659933] sh D ffff88014dcb32c0 0 11902 11871 0x00000000 [327372.659938] ffff88003580dc78 0000000000000086 ffff880002844680 ffff88003580dfd8 [327372.659942] ffff88003580dfd8 ffff88003580dfd8 ffff88014783c4c0 ffff880002844680 [327372.659946] 0000000000000246 ffff880146858800 0000000000000000 0000000000000001 [327372.659949] Call Trace: [327372.659964] [<ffffffff811715cb>] __sb_start_write+0xcb/0x110 [327372.659972] [<ffffffff8118d24b>] mnt_want_write+0x1b/0x50 [327372.659978] [<ffffffff8117d617>] do_last+0xa47/0xed0 [327372.659984] [<ffffffff8117db63>] path_openat+0xc3/0x4c0 [327372.659988] [<ffffffff8117e804>] do_filp_open+0x44/0xb0 [327372.659993] [<ffffffff8116eb83>] do_sys_open+0xf3/0x1e0 [327372.659998] [<ffffffff8159f22d>] system_call_fastpath+0x1a/0x1f [327372.660017] [<00007fa0464a1fe0>] 0x7fa0464a1fdf uname -a reports: Linux byrd 3.7.9-1-desktop #1 SMP PREEMPT Sun Feb 17 23:09:22 UTC 2013 (ae1c506) x86_64 x86_64 x86_64 GNU/Linux The last lines from osc are: [ 378s] Retrieving package sysconfig-0.76.4-1.8.1.x86_64 (173/201), 307.8 KiB (746.2 KiB unpacked) [ 379s] Installing: sysconfig-0.76.4-1.8.1 [.......done] [ 379s] Additional rpm output: [ 379s] warning: /var/cache/kiwi/zypper/packages/usr_src_packages_SOURCES_repos_openSUSE:12.2:Update_standard/x86_64/sysconfig-0.76.4-1.8.1.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 3dbdc284: NOKEY [ 379s] Updating /etc/sysconfig/network/dhcp... [ 379s] Updating /etc/sysconfig/network/config... [ 379s] Removing old autogenerated device configuration files: [ 379s] [ 379s] [ 379s] Retrieving package grub-0.97-185.1.2.x86_64 (174/201), 331.0 KiB ( 1.4 MiB unpacked) And then it hangs... Steps to reproduce: 1. osc co systemsmanagement:crowbar:2.0 Crowbar_2.0 2. cd systemsmanagement\:crowbar\:2.0/Crowbar_2.0/ 3. osc build images x86_64 4. wait ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c1
--- Comment #1 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c2
--- Comment #2 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c3
--- Comment #3 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c4
Adam Spiers
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c5
--- Comment #5 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c6
Jiri Kosina
My computer is using ext4 as filesystem.
Would it be easily possible for you to test the same workload on a different filesystem to see whether this is ext4-specific or not? Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c7
--- Comment #7 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c8
Ralf Haferkamp
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c9
--- Comment #9 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c10
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c11
--- Comment #11 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c12
Jiri Kosina
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c13
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c14
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c15
--- Comment #15 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c16
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c17
--- Comment #17 from Marcus Schaefer
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c18
--- Comment #18 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c19
--- Comment #19 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c20
--- Comment #20 from Marcus Schaefer
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c21
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c22
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c23
Torsten Duwe
* Torsten, why is the postinstall needed this way?
Because someone insisted to support booting Linux from xfs, which was never designed for that purpose. Grub does not use bmap(), but its own FS code on the raw disk. * Linux xfs is extremely lazy writing back dirty data * Linux buffer caches are not coherent (/dev/sda does not include /dev/sda1) The whole magic was needed to get stage2 onto the disk, so it can get modified in-place. Seems like someone broke the detection logic. This bug needs to be reassigned accordingly, or boot from XFS declared unsupported. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c24
Jan Kara
(In reply to comment #21)
* Torsten, why is the postinstall needed this way?
Because someone insisted to support booting Linux from xfs, which was never designed for that purpose. Grub does not use bmap(), but its own FS code on the raw disk. I see, that's why grub cannot work in osc chroot. BTW, why does osc even pull in grub Andreas?
* Linux xfs is extremely lazy writing back dirty data I think you meant metadata (for data, sync should be really enough). I wonder how come you didn't see similar problems with ext3/4 - likely because they have all metadata exposed in buffer cache while xfs does not.
* Linux buffer caches are not coherent (/dev/sda does not include /dev/sda1) True - but sync(1) plus flushing the caches on the device you want to use (BLKFLSBUF ioctl) solves this problem.
The whole magic was needed to get stage2 onto the disk, so it can get modified in-place. Seems like someone broke the detection logic. This bug needs to be reassigned accordingly, or boot from XFS declared unsupported. No, your postinstall logic is not broken. Just kernel got more strict about writing to frozen filesystem and in osc chroot /dev/null is on filesystem grub postinstall freezes so shell gets blocked when opening /dev/null.
Anyway the simplest solution seem to be to change postinstall to do: (xfs_freeze -f /; xfs_freeze -f /boot; xfs_freeze -u /; xfs_freeze -u /boot ) > /dev/null 2>&1 That way /dev/null will be opened on still unfrozen filesystem (subsequent writing to it doesn't block because device null as such isn't frozen) and things will work (tested that). Thorsten, can you change postinstall like that please? Thanks! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c25
--- Comment #25 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c26
Torsten Duwe
(In reply to comment #23)
(In reply to comment #21)
* Linux xfs is extremely lazy writing back dirty data I think you meant metadata (for data, sync should be really enough). I wonder how come you didn't see similar problems with ext3/4 - likely because they have all metadata exposed in buffer cache while xfs does not.
* Linux buffer caches are not coherent (/dev/sda does not include /dev/sda1) True - but sync(1) plus flushing the caches on the device you want to use (BLKFLSBUF ioctl) solves this problem.
No, it didn't. I was recommended to use xfs_freeze and that did the job.
(xfs_freeze -f /; xfs_freeze -f /boot; xfs_freeze -u /; xfs_freeze -u /boot ) > /dev/null 2>&1
That way /dev/null will be opened on still unfrozen filesystem (subsequent writing to it doesn't block because device null as such isn't frozen) and things will work (tested that). Thorsten, can you change postinstall like that please? Thanks!
The whole logic was to be carried out exclusively on xfs, or when the root fs type could not be determined. It looks to me as if someone clueless has inserted the LOADER_TYPE logic in the wrong place. BTW, where are the valid values for YAST_IS_RUNNING documented? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
(In reply to comment #24)
(In reply to comment #23)
(In reply to comment #21)
* Linux xfs is extremely lazy writing back dirty data I think you meant metadata (for data, sync should be really enough). I wonder how come you didn't see similar problems with ext3/4 - likely because they have all metadata exposed in buffer cache while xfs does not.
* Linux buffer caches are not coherent (/dev/sda does not include /dev/sda1) True - but sync(1) plus flushing the caches on the device you want to use (BLKFLSBUF ioctl) solves this problem.
No, it didn't. I was recommended to use xfs_freeze and that did the job. I believe it doesn't make all the metadata properly visible for XFS (since it caches some things in private allocations). But I'm rather sure it does solve
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c27
Jan Kara
(xfs_freeze -f /; xfs_freeze -f /boot; xfs_freeze -u /; xfs_freeze -u /boot ) > /dev/null 2>&1
That way /dev/null will be opened on still unfrozen filesystem (subsequent writing to it doesn't block because device null as such isn't frozen) and things will work (tested that). Thorsten, can you change postinstall like that please? Thanks!
The whole logic was to be carried out exclusively on xfs, or when the root fs type could not be determined. Yes, the second option is actually the case in case of osc build root.
It looks to me as if someone clueless has inserted the LOADER_TYPE logic in the wrong place.
BTW, where are the valid values for YAST_IS_RUNNING documented? Sorry but I fail to see how that is relevant. Code in grub postinstall can open inode (/dev/null) on the frozen filesystem for writing. That is clearly a bug which is easy to fix by opening the inode before the filesystem is frozen. Changing those two lines should be a trivial thing to do...
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c28
Torsten Duwe
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c29
--- Comment #29 from Ralf Haferkamp
Again: please point me at some documentation about YAST_IS_RUNNING, and I'l give %post another try.
Not exactly documentation :(, but this might help: http://lists.opensuse.org/yast-devel/2013-01/msg00006.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c30
--- Comment #30 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c31
Torsten Duwe
Not exactly documentation :(, but this might help: http://lists.opensuse.org/yast-devel/2013-01/msg00006.html
Very good! That should help. So, "instsys" is indeed the only case that needs extra care. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c32
Christian Bourque
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c33
--- Comment #33 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c34
--- Comment #34 from Christian Bourque
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c35
--- Comment #35 from Christian Bourque
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c36
Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c37
--- Comment #37 from Christian Bourque
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c38
Adrian Schröter
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c39
--- Comment #39 from Adrian Schröter
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c40
--- Comment #40 from Jan Kara
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c41
--- Comment #41 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c42
--- Comment #42 from Bernhard Wiedemann
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c43
Torsten Duwe
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c45
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=805732
https://bugzilla.novell.com/show_bug.cgi?id=805732#c
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com