[opensuse] XFS filesystem total failure on return from hibernation.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi, Since yesterday (after an update), I have experimented twice the same problem, on return from hibernation. XFS fails completely, lots of kernel messages. System has to be restarted, recovery is impossible otherwise. It seems to affect one partition only (?), the one that has my home, on xfs. I booted the 13.1 XFCE rescue image, tried to xfs-repair that one. It complained that there was a log, that I should attempt to mount it first. Mount hung. It is unkillable. Reboot of system hung. smartctl -a gives no hint of problem. The hard disk is new (190 hours). I run "xfs_repair -L" on that disk, which succeeds with no error report. I'm at the moment running the long SMART test on that disk. It will take 226 minutes. This is the first report of error in the log: I will put a more complete version of the log in another post. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlMk14AACgkQja8UbcUWM1wgoQD/Ssm98MXd5XSXqi6AkdjhgHmR z+PSYYCkgbupkpnLJAgA/0WO9+HQ9Wgn7XzkVLQkv3RreJT/nZQfJRLx39TqWMar =cU8T -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2014-03-15 23:43, Carlos E. R. wrote:
This is the first report of error in the log:
Oops. Disappeared? Reposting the log:
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9 <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298351] CPU: 0 PID: 28877 Comm: kworker/0:7 Tainted: P O 3.11.10-7-desktop #1 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298353] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298388] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298391] 0000000000000000 ffffffff8159ff82 0000000000007121 ffffffffa0c53996 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298395] ffff880151e21cc0 ffff880234093600 ffff88023016bbe0 0000000000000000 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298398] 0000000000000000 0000000100000000 0000000000000000 0000000000000001 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298402] Call Trace: <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298415] [<ffffffff81004a18>] dump_trace+0x88/0x310 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298419] [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298423] [<ffffffff810061ac>] show_stack+0x1c/0x50 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298428] [<ffffffff8159ff82>] dump_stack+0x50/0x89 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298449] [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298511] [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298571] [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298643] [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298734] [<ffffffffa0c4e633>] xfs_free_eofblocks+0x1e3/0x260 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298786] [<ffffffffa0c441ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298828] [<ffffffffa0c42f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298868] [<ffffffffa0c43a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298909] [<ffffffffa0c43d82>] xfs_eofblocks_worker+0x12/0x20 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298937] [<ffffffff8106ac68>] process_one_work+0x168/0x490 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298942] [<ffffffff8106b904>] worker_thread+0x114/0x3a0 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298946] [<ffffffff81071c2f>] kthread+0xaf/0xc0 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298952] [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0 <0.5> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298959] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c673d8 <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.331745] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.331748] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
- -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlMk2pkACgkQja8UbcUWM1wxeAD/f4KKukV0Vbvjk3FDXnS6ULyh IHJrm37E2d91nsTcNsgA+gPKv/B+fwaf5omWNHPsisVn3l75ancliBgf5q1qPn3Z =P5hl -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 El 2014-03-15 a las 23:56 +0100, Carlos E. R. escribió: Thunderbird wrapped the lines awfully, trying alpine instead. <5.4> 2014-03-15 22:20:32 Telcontar pm-utils - - - Thawing (1)... <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9 <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298351] CPU: 0 PID: 28877 Comm: kworker/0:7 Tainted: P O 3.11.10-7-desktop #1 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298353] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298388] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298391] 0000000000000000 ffffffff8159ff82 0000000000007121 ffffffffa0c53996 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298395] ffff880151e21cc0 ffff880234093600 ffff88023016bbe0 0000000000000000 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298398] 0000000000000000 0000000100000000 0000000000000000 0000000000000001 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298402] Call Trace: <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298415] [<ffffffff81004a18>] dump_trace+0x88/0x310 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298419] [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298423] [<ffffffff810061ac>] show_stack+0x1c/0x50 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298428] [<ffffffff8159ff82>] dump_stack+0x50/0x89 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298449] [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298511] [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298571] [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298643] [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298734] [<ffffffffa0c4e633>] xfs_free_eofblocks+0x1e3/0x260 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298786] [<ffffffffa0c441ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298828] [<ffffffffa0c42f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298868] [<ffffffffa0c43a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298909] [<ffffffffa0c43d82>] xfs_eofblocks_worker+0x12/0x20 [xfs] <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298937] [<ffffffff8106ac68>] process_one_work+0x168/0x490 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298942] [<ffffffff8106b904>] worker_thread+0x114/0x3a0 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298946] [<ffffffff81071c2f>] kthread+0xaf/0xc0 <0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298952] [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0 <0.5> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298959] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c673d8 - -- Cheers Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlMk38wACgkQja8UbcUWM1zNBwD/cJK+SN9E1W/mfzhdzyH5F+A1 wRUI3TP6fv9/qJg75/0A/A2ElYiTbq5L0TxZ+m49pxna6NHxhFsd8sgNOrF7Y0Ma =TECR -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 В Sun, 16 Mar 2014 00:18:36 +0100 (CET) "Carlos E. R." <carlos.e.r@opensuse.org> пишет:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
El 2014-03-15 a las 23:56 +0100, Carlos E. R. escribió:
Thunderbird wrapped the lines awfully, trying alpine instead.
<5.4> 2014-03-15 22:20:32 Telcontar pm-utils - - - Thawing (1)... <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
Looks similar to http://oss.sgi.com/archives/xfs/2014-02/msg00674.html (even line number matches). You may want to post to opensuse-kernel and/or xfs development list. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlMlJTUACgkQR6LMutpd94wOfACcCzUbdUDqNMW0Al7RPJ43MzAH kY8AnRtdpJCFJiDm4bC5Oz6YLKPZaAxj =+Ms7 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2014-03-16 05:14, Andrey Borzenkov wrote:
В Sun, 16 Mar 2014 00:18:36 +0100 (CET) "Carlos E. R." <> пишет:
El 2014-03-15 a las 23:56 +0100, Carlos E. R. escribió:
Thunderbird wrapped the lines awfully, trying alpine instead.
<5.4> 2014-03-15 22:20:32 Telcontar pm-utils - - - Thawing (1)... <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
Looks similar to http://oss.sgi.com/archives/xfs/2014-02/msg00674.html (even line number matches). You may want to post to opensuse-kernel and/or xfs development list.
Thanks. No hint of a solution, though. Unfortunately, at least some kernel devs will say that my system is tainted and discard the problem asap. I'll think about it. I got help from the XFS people once, when I hit a bad bug long ago. Meanwhile, I did a backup of the partition, and reformatted it completely. As it has happened twice, if there is some filesystem corruption, xfsrepair is unable to clear it completely. Or there is some kernel issue that causes it again, but in that case it would probably happen on any of my xfs filesystems, not always the same one. I'm now restoring the data to the partition. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlMlmJoACgkQja8UbcUWM1y5XwD/bvRgaLY4csLXJl6kPDuDyppW 1Jma0ADsJ1peYlr0vsgA/jqJDRGkxABb/K61/iwDQP4PYA6dFqhf++axaC29wqVT =1IxD -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Content-ID: <alpine.LSU.2.11.1406291630540.7623@Telcontar.valinor> On Sunday, 2014-03-16 at 08:14 +0400, Andrey Borzenkov wrote:
В Sun, 16 Mar 2014 00:18:36 +0100 (CET) "Carlos E. R." <> пишет:
El 2014-03-15 a las 23:56 +0100, Carlos E. R. escribió:
Thunderbird wrapped the lines awfully, trying alpine instead.
<5.4> 2014-03-15 22:20:32 Telcontar pm-utils - - - Thawing (1)... <0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
Looks similar to http://oss.sgi.com/archives/xfs/2014-02/msg00674.html (even line number matches). You may want to post to opensuse-kernel and/or xfs development list.
(didn't see a solution there) It reappeared today, same partition (/home). I had forgotten about this, and I had hoped that the last kernel update might had corrected it. No such luck. <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9 ... <0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8 <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected. Shutting down filesystem <0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s) It only happens after recovery from hibernation. This time I'm very busy, no time to investigate (and got a headache, didn't think on obtaining xfs log data or whatever). So I rebooted (with reset button,, umount hangs), repaired (which does not really correct the problem), did a backup with xfsdump, reformatted the partition (with yast this time), restored with xfsrestore. System is back alive, till next time. I'm now downloading a bunch of archived mbox files out of <http://oss.sgi.com/archives/xfs/>, but I do not see a link for subscription, and the posts themselves do not contain any info on that. How do I subscribe there? :-? - -- Cheers, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlOwJFgACgkQtTMYHG2NR9VI/gCfaVkdFt0YKg9bnb1/StM1w8ef CmcAn3J44CvnHG6AHQw+zlRkM2wF86YO =NgWw -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 В Sun, 29 Jun 2014 16:36:08 +0200 (CEST) "Carlos E. R." <carlos.e.r@opensuse.org> пишет:
I'm now downloading a bunch of archived mbox files out of <http://oss.sgi.com/archives/xfs/>, but I do not see a link for subscription, and the posts themselves do not contain any info on that.
How do I subscribe there? :-?
http://xfs.org/index.php/XFS_email_list_and_archives -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlOwPQkACgkQR6LMutpd94zWZgCgw+lmMZrDHH7RqzfLXdYOJx2W 5dgAnR/AFnDqTCWgjorAPqtzpWFJufS+ =gQeF -----END PGP SIGNATURE----- N▀╖╡ФЛr╦⌡yИ ┼Z)z{.╠О╝·к⌡╠йБmЙ)z{.╠Й+│:╒{Zrшaz▄'z╥╕j)h╔ИЛ╨г╬ё ч╝┼^·к╛z┼Ю
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-06-29 18:21, Andrey Borzenkov wrote:
В Sun, 29 Jun 2014 16:36:08 +0200 (CEST) "Carlos E. R." <> пишет:
I'm now downloading a bunch of archived mbox files out of <http://oss.sgi.com/archives/xfs/>, but I do not see a link for subscription, and the posts themselves do not contain any info on that.
How do I subscribe there? :-?
Thanks! It is a different server than the one I was looking at. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlOwRCAACgkQtTMYHG2NR9Xb+wCeId829HADBLlPVVnF8ZXgn4jE GDQAoJhBEWiq11+3kG3wIHY8HO/yp7sb =Sl8z -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Carlos E. R. wrote: The last message on this thread on the xfs list was about 5-6 days ago. Any luck in finding this? BTW... is your new disk an advanced-format, 512e, or 4096k sector size? I'm trying to think of things that might change the disk format that could trigger the problem...? I am interested in the outcome of this, BTW... Thanks. Linda p.s. as a workaround -- in your "hibernate script", is it remotely possible to init down to some state to allow unmounting /home before the hibernate? It's a gross workaround, and likely not workable, but thought I'd ask. Is /home an lvm volume?.. That could be helpful or a cause, but for helpful, creating a snapshot before hibernating might help protect against data loss... -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2014-07-11 at 15:31 -0700, Linda Walsh wrote:
Carlos E. R. wrote:
The last message on this thread on the xfs list was about 5-6 days ago.
I just posted another one that was pending, I had forgotten till I saw yours. I sent them the link to the metadata, but I'm afraid it will not help, because it is not "corrupt". We have to wait till the event happens again, and then I take a photo before repairing the filesystem. I can not even allow it to mount, because that replays the log. Unless the corruption is one simply not detected, till the filesystem crashes.
Any luck in finding this?
For those here, the XFS people think that the culprit is the kernel, that does not restore properly all memory structures used by the XFS filesystem. The suggestion is not to hibernate till the kernel people solve hibernation once and for all - or so I understood.
BTW... is your new disk an advanced-format, 512e, or 4096k sector size?
Dunno... Wait. This was the original filesystem, as copied: meta-data=/dev/sdf2 isize=256 agcount=4, agsize=122341568 blks = sectsz=512 attr=2 data = bsize=4096 blocks=489366272, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=238948, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 And this is the current one: meta-data=/dev/sde5 isize=256 agcount=4, agsize=32000000 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 data = bsize=4096 blocks=128000000, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=62500, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Something wrong... data size do not match. [...] Oops. The first xfs_info belongs to the wrong one. I'm very confused. See note on the other list. It is as is when I do "xfs_info tgtfile" it is done on the the device where the file is stored, not on the file. Wow, that's it... Look, with the exact same file, copied on two partitions: Telcontar:/data/storage_c/tmp_borrar # xfs_info tgtfile meta-data=/dev/sde18 isize=256 agcount=4, agsize=35770496 blks = sectsz=512 attr=2, projid32bit=0 = crc=0 data = bsize=4096 blocks=143081984, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=69864, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Telcontar:/data/storage_c/tmp_borrar # Telcontar:/data/storage_d/old_backup # xfs_info tgtfile meta-data=/dev/sdf2 isize=256 agcount=4, agsize=122341568 blks = sectsz=512 attr=2, projid32bit=0 = crc=0 data = bsize=4096 blocks=489366272, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=238948, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Telcontar:/data/storage_d/old_backup # False alarm. I need sleep. :-(
p.s. as a workaround -- in your "hibernate script", is it remotely possible to init down to some state to allow unmounting /home before the hibernate? It's a gross workaround, and likely not workable, but thought I'd ask.
I don't see how... home can not be umounted at all while in use. Synced to disk, I hope it is already done.
Is /home an lvm volume?..
Nay, it is not.
That could be helpful or a cause, but for helpful, creating a snapshot before hibernating might help protect against data loss...
So far, there is not been data loss. I can make an xfsdump, which is reasonably fast - for a partition so big. - -- Cheers, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlPAkdQACgkQtTMYHG2NR9VfywCfadZW35Q/ERfLTaXl8u29Xyo0 VBMAn1rm7I2uv470zmuilr6fN0cjj1GE =KTtW -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi,
Since yesterday (after an update), I have experimented twice the same problem, on return from hibernation. XFS fails completely, lots of kernel messages. System has to be restarted, recovery is impossible otherwise.
I, as well, would suggest asking about the problem on the xfs list. They aren't as anal about tainted kernels, & Dave Chinner goes out of is way to look at some of these problems (along with other xfs devels, but I think he's current lead -- not sure/ no quoting me on that!). But more to the point, he posted some patches for review yesterday morning involving writes at EOF and interlock problems, that might address or overlap your issue... (People do direct I/O via mapping to the file past the end of file at the same time the file is being extended... things get tricky -- direct I/O doesn't like it's view being 'initialized to zero' after EOF -- even though it shouldn't have been able to map past EOF to begin with... hmmm... Code looks really hairy... one of the other devs just wrote up an FAQ on XFS's speculative preallocation.... next will be one on it's precognitive-disk writing strategy... ;-) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-03-23 01:27, Linda Walsh wrote:
Carlos E. R. wrote:
I, as well, would suggest asking about the problem on the xfs list.
Yes, I know. I keep forgetting to do it, and when I remember, I'm too busy. Meanwhile, I made a backup of the files in that partition, and reformatted it. The issue has not reappeared, and I keep my fingers closed.
But more to the point, he posted some patches for review yesterday morning involving writes at EOF and interlock problems, that might address or overlap your issue...
But I can not realistically apply patches myself. I have to wait till openSUSE kernel people do it, and they will not do till 13.2...
(People do direct I/O via mapping to the file past the end of file at the same time the file is being extended... things get tricky -- direct I/O doesn't like it's view being 'initialized to zero' after EOF -- even though it shouldn't have been able to map past EOF to begin with... hmmm...
Code looks really hairy... one of the other devs just wrote up an FAQ on XFS's speculative preallocation.... next will be one on it's precognitive-disk writing strategy... ;-)
Ugh. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
Linda Walsh wrote:
Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi,
Since yesterday (after an update), I have experimented twice the same problem, on return from hibernation. XFS fails completely, lots of kernel messages. System has to be restarted, recovery is impossible otherwise.
I, as well, would suggest asking about the problem on the xfs list.
They aren't as anal about tainted kernels, & Dave Chinner goes out of is way to look at some of these problems (along with other xfs devels, but I think he's current lead -- not sure/ no quoting me on that!).
But more to the point, he posted some patches for review yesterday morning involving writes at EOF and interlock problems, that might address or overlap your issue...
(People do direct I/O via mapping to the file past the end of file at the same time the file is being extended... things get tricky -- direct I/O doesn't like it's view being 'initialized to zero' after EOF -- even though it shouldn't have been able to map past EOF to begin with... hmmm...
Sounds like multiple programs trying to do write-activity on the same file at the same time.... which OF COURSE produces race conditions. Sounds like the root of such a problem lies in poor I/O management among multiple processes which should be communicating with each other in some way (semaphores?) to indicate when one of them wants to change the file. I don't know if any amount of filesystem code can prevent problems caused by appication programmers who don't even understand that they are creating race conditions in their own file I/O.
Code looks really hairy... one of the other devs just wrote up an FAQ on XFS's speculative preallocation.... next will be one on it's precognitive-disk writing strategy... ;-)
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2014-03-15 at 23:43 +0100, Carlos E. R. wrote:
Hi,
Since yesterday (after an update), I have experimented twice the same problem, on return from hibernation. XFS fails completely, lots of kernel messages. System has to be restarted, recovery is impossible otherwise.
It seems to affect one partition only (?), the one that has my home, on xfs.
I reported the issue on the XFS mail list, by suggestion of Andrey Borzenkov and Linda Walsh: +++·························· Date: Wed, 2 Jul 2014 11:57:25 +0200 (CEST) From: Carlos E. R. <carlos.e.r@os> To: XFS mail list <xfs@oss.sgi.com> Subject: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue. ··························++- Interestingly, it appears to only impact "/home". Possibly there is some application that triggers it with some activity. I have no idea what. Some file in some particular state? They initially thought that the kernel did not flush all structures to disk previous to hibernation. Don't take my word on it, my memory may confuse the details, and some escape my understanding. Just read the thread. And finally, there is a patch: 8018ec0 xfs: mark all internal workqueues as freezable +++·························· Date: Tue, 9 Sep 2014 19:00:21 -0500 (CDT) From: xfs@oss.sgi.com To: xfs@oss.sgi.com Subject: [XFS updates] XFS development tree branch, for-next, updated. xfs-for-linus-3.17-rc3-12-ga4241ae ··························++- I have reported now on our Bugzilla, against 13.1: <https://bugzilla.opensuse.org/show_bug.cgi?id=899785> Requesting that the patch be backported to all openSUSE releases. This bug can badly impact 13.2 and the new SLES 12, as by default they use XFS for home. - -- Cheers, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlQxVMkACgkQtTMYHG2NR9WWRgCeNIaaqIM3SMX1Z5lNpukAZ+OO eu4An2vx8ZGE5/gk54fZhxNHF48nNBgC =xX8t -----END PGP SIGNATURE-----
participants (5)
-
Andrey Borzenkov
-
Carlos E. R.
-
Carlos E. R.
-
Dirk Gently
-
Linda Walsh