http://bugzilla.opensuse.org/show_bug.cgi?id=1008107 Bug ID: 1008107 Summary: Potential XFS Kernel bug - _xfs_buf_find: Block out of range Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.1 Hardware: x86-64 OS: openSUSE 42.1 Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem Assignee: bnc-team-screening@forge.provo.novell.com Reporter: david@the-taylor-family.org QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Posted this originally on the Leap 42.1 support forum and it was suggested I post it over here as it may be kernel bug with respect to XFS. https://forums.opensuse.org/showthread.php/518978-Leap-42-1-XFS-corrupted-fi... dcurtisfra posted this in reply to my forum post: Looking at your report and, the Ubuntu Bug Report, it could be that we have a Leap 42.1 Kernel issue with respect to XFS here. It may be a good idea to raise a Bug Report https://bugzilla.opensuse.org/ containing everything that you've found. The Ubuntu folks are suggesting that, the latest Kernel version may alleviate this issue but, that Kernel is appearing in the openSUSE distribution with Leap 42.2 which, is still in the Release Candidate testing phase. Here's the Canonical/Ubuntu bug reference: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1576599 Here's the (majority) of what I had posted in the forums referenced above: Had complaints that the squid proxy was not working for several machines on the network, so I investigated the Leap 42.1 box I have handling proxy services. It would ping and ports would respond to Nagios TCP checks (service checks were faulting), but I couldn't log in and none of the services on the box were responsive. I power cycled the machine and came up to the emergency/maintenance recovery login. Once in the maintenance mode I determined the /var file system was corrupted and any attempt to mount it, xfs_repair, etc. had the effect of hanging the system indefinitely requiring another power cycle to recover. Booting from the Leap USB stick, I was able to get a little further, but was unsuccessful in getting /var back. I had tried mounting readonly with norecover but it still refused. Flushing the log/metadata with the -L option to xfs_repair was the only way to get past the problem. (I have backups of the logs from the night before, so just lost a little syslog data from some other systems, not a big issue here). Once I had /var mounted, I was able to look at the messages in the log which referred to _xfs_buf_find: Block out of range errors. These occurred when logrotate was trying to swap logs around. It was still writing logs against my main system messages file (I use syslog_ng vs systemd journal logging) up until the point I power cycled the system so I guess it had sufficient allocation on that file without requesting more. As squid had stopped working, along with logins, etc, I expect the /var file system failure was preventing opening and/or writing other logs (thus appearing locked up). Best I can tell based on the SMART results (I have smartd running) and lack of any other kernel warnings of disk failures, this does not appear to have been a disk failure. The Call Trace in the Ubuntu bug listed a grow_inode call, which was not present in the Call Trace from my crash, so while they both list the block out of range fault, they may not be related... There were a total of six dumps within several seconds, all referencing the same PID (logrotate). Here's the first one. I'll attach an edited copy of the log that contains more details plus reboot information for those who need/want to know. I did try and clean up unnecessary noise from the attached log as well as identifying bits that I didn't think needed publicly posted. Oct 29 01:00:05 shadows kernel: XFS (dm-9): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0x1000000 Oct 29 01:00:05 shadows kernel: [665260.471535] XFS (dm-9): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0x1000000 Oct 29 01:00:05 shadows kernel: [665260.471581] ------------[ cut here ]------------ Oct 29 01:00:05 shadows kernel: [665260.471626] WARNING: CPU: 3 PID: 4863 at ../fs/xfs/xfs_buf.c:473 _xfs_buf_find+0x2a1/0x2f0 [xfs]() Oct 29 01:00:05 shadows kernel: [665260.471627] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x _tables nf_nat nf_conntrack br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop af_packet iscsi_ibft iscsi_boot_sysfs joydev hid_generic usbhid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_code c_generic intel_rapl snd_hda_intel snd_hda_controller i915 snd_hda_codec snd_hda_core x86_pkg_temp_thermal snd_hwdep intel_powerclamp coretemp video snd_pcm drm_kms_helper iTCO_wdt iTCO_vendor_support snd_timer gpio_ich snd soundcore i2c_i801 drm mei_me kvm mei e1000e lpc_ich ptp mfd_core pps_core serio_raw i2c_algo_bit crct10dif_pclmul ppdev parport_pc tpm_tis tpm 8250_fintek pcspkr processor parport wmi button crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul g lue_helper ablk_helper cryptd xfs libcrc32c crc32c_intel sr_mod cdrom ehci_pci ehci_hcd usbcore usb_common dm_mod sg Oct 29 01:00:05 shadows kernel: [665260.471680] CPU: 3 PID: 4863 Comm: logrotate Not tainted 4.1.31-30-default #1 Oct 29 01:00:05 shadows kernel: [665260.471682] Hardware name: LENOVO 7005AK8/ , BIOS 9HKT46AUS 12/15/2011 Oct 29 01:00:05 shadows kernel: [665260.471684] 0000000000000286 0000000000000000 ffffffff8165ef0d 0000000000000000 Oct 29 01:00:05 shadows kernel: [665260.471687] 0000000000000000 ffffffffa016c24c ffffffff81068961 ffff88042a2b3340 Oct 29 01:00:05 shadows kernel: [665260.471689] 0000000000000008 00000007fffffff8 0000000000000000 0000000000000001 Oct 29 01:00:05 shadows kernel: [665260.471692] Call Trace: Oct 29 01:00:05 shadows kernel: [665260.471704] [<ffffffff810055cc>] dump_trace+0x8c/0x340 Oct 29 01:00:05 shadows kernel: [665260.471709] [<ffffffff8100597c>] show_stack_log_lvl+0xfc/0x1a0 Oct 29 01:00:05 shadows kernel: [665260.471712] [<ffffffff81006ec1>] show_stack+0x21/0x50 Oct 29 01:00:05 shadows kernel: [665260.471717] [<ffffffff8165ef0d>] dump_stack+0x5d/0x79 Oct 29 01:00:05 shadows kernel: [665260.471722] [<ffffffff81068961>] warn_slowpath_common+0x81/0xb0 Oct 29 01:00:05 shadows kernel: [665260.471747] [<ffffffffa012ab91>] _xfs_buf_find+0x2a1/0x2f0 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471773] [<ffffffffa012ac07>] xfs_buf_get_map+0x27/0x2c0 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471800] [<ffffffffa0158871>] xfs_trans_get_buf_map+0x131/0x1e0 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471826] [<ffffffffa0102abc>] xfs_btree_get_bufs+0x4c/0x60 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471845] [<ffffffffa00ebaa9>] xfs_alloc_fix_freelist+0x179/0x410 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471863] [<ffffffffa00ec4e8>] xfs_free_extent+0x88/0x110 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471886] [<ffffffffa0126d27>] xfs_bmap_finish+0x137/0x190 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471912] [<ffffffffa013e174>] xfs_itruncate_extents+0x184/0x330 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471935] [<ffffffffa013e3aa>] xfs_inactive_truncate+0x8a/0x110 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471957] [<ffffffffa013f218>] xfs_inactive+0x128/0x150 [xfs] Oct 29 01:00:05 shadows kernel: [665260.471964] [<ffffffff811f9e00>] evict+0xb0/0x170 Oct 29 01:00:05 shadows kernel: [665260.471968] [<ffffffff811f5b70>] __dentry_kill+0x170/0x1e0 Oct 29 01:00:05 shadows kernel: [665260.471973] [<ffffffff811f5d66>] dput+0x186/0x240 Oct 29 01:00:05 shadows kernel: [665260.471982] [<ffffffff811e0bc0>] __fput+0x150/0x1c0 Oct 29 01:00:05 shadows kernel: [665260.471987] [<ffffffff81085057>] task_work_run+0xa7/0xe0 Oct 29 01:00:05 shadows kernel: [665260.471991] [<ffffffff81002f59>] do_notify_resume+0x69/0x90 Oct 29 01:00:05 shadows kernel: [665260.471997] [<ffffffff816658c1>] int_signal+0x12/0x17 Oct 29 01:00:05 shadows kernel: [665260.472004] [<00007f4b210f52d0>] 0x7f4b210f52d Several items about the attached log so as to hopefully reduce any confusion. I had attached a Samsung 850 SSD to the machine to serve as a storage device for recovering bits of data between the nightly backup and when it crashed. It is not normally attached (same goes for the USB stick). It was also disconnected from the network as I had a VM stood up to take over proxy services while diagnosing this box so I would have had duplicate IPs. It also normally has a DVD drive attached which is not (the Samsung is using it's SSD port) Hardware is a mostly stock Lenovo M91p with an i5 and 16GB RAM. Nothing fancy in it and running a text console. -- You are receiving this mail because: You are on the CC list for the bug.