[opensuse-kernel] Kernel:HEAD btrfs + chromium == hard lockup, crash
Hi: Since moving to 3.11rc2 the following happends Accessing a heavy site like youtube with chromium 30.x ends with
Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: general protection fault: 0000 [#1] PREEMPT SMP Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Modules linked in: af_packet cachefiles fscache hid_logitech_dj snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp kvm_int Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: CPU: 2 PID: 1644 Comm: SimpleCacheWork Tainted: P O 3.11.0-rc2-1.g00cdcf9-desktop #1 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Hardware name: DELL Inc. Studio XPS 435T/9000/0X501H, BIOS A16 02/04/2010 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: task: ffff8801bd4b2400 ti: ffff88019ff5a000 task.ti: ffff88019ff5a000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RIP: 0010:[<ffffffff812d85bd>] [<ffffffff812d85bd>] memcpy+0xd/0x110 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RSP: 0018:ffff88019ff5b9a0 EFLAGS: 00010202 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RAX: ffff880124c74107 RBX: 000000000000065a RCX: 00000000000000cb Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RDX: 0000000000000002 RSI: db73880000000000 RDI: ffff880124c74107 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RBP: ffff8801200ff7b0 R08: 0000000000000761 R09: 0000000000001000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 6db6db6db6db6db7 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: R13: 0000160000000000 R14: ffff880124c74761 R15: 000000000000065a Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: FS: 00007fd825640700(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: CR2: 00007fd8327d81a3 CR3: 00000001c9b52000 CR4: 00000000000027e0 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Stack: Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ffffffffa00cb868 0000000000000000 ffff880219379800 0000000000000000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ffff880120022a30 ffff880124c74000 ffff8801200ff6e0 ffff88010e865338 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ffffffffa00b0aab 0000000000001000 ffff8801318226d0 0000000000000761 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Call Trace: Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00cb868>] read_extent_buffer+0xc8/0x120 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00b0aab>] btrfs_get_extent+0x8db/0x980 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00c8b36>] __extent_read_full_page+0x316/0x7d0 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00c9f99>] extent_readpages+0x169/0x1c0 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff81129a43>] __do_page_cache_readahead+0x1b3/0x260 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff81129f17>] ondemand_readahead+0x117/0x250 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff8111fb66>] generic_file_aio_read+0x4a6/0x6f0 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff811827f7>] do_sync_read+0x67/0x90 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff81182d94>] vfs_read+0x94/0x160 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff811839e3>] SyS_pread64+0x63/0xa0 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff815b6fff>] tracesys+0xe1/0xe6 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<00007fd85666dbd3>] 0x7fd85666dbd2 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Code: fc ff ff 48 8b 43 58 48 2b 43 50 88 43 4e eb e9 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RIP [<ffffffff812d85bd>] memcpy+0xd/0x110 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RSP <ffff88019ff5b9a0> Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ---[ end trace a4449cb6cdee0f9a ]---
No futher IO can be done, machine locks up, REISUB is the only escape :-| -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
No futher IO can be done, machine locks up, REISUB is the only escape :-|
Disabling chromium "simple cache" in chrome://flags workarounds the problem btw. -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 7/24/13 10:08 PM, Cristian Rodríguez wrote:
Hi:
Since moving to 3.11rc2 the following happends
This Oops is being tracked here: https://bugzilla.novell.com/show_bug.cgi?id=831374 -Jeff
Accessing a heavy site like youtube with chromium 30.x ends with
Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: general protection fault: 0000 [#1] PREEMPT SMP Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Modules linked in: af_packet cachefiles fscache hid_logitech_dj snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp kvm_int Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: CPU: 2 PID: 1644 Comm: SimpleCacheWork Tainted: P O 3.11.0-rc2-1.g00cdcf9-desktop #1 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Hardware name: DELL Inc. Studio XPS 435T/9000/0X501H, BIOS A16 02/04/2010 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: task: ffff8801bd4b2400 ti: ffff88019ff5a000 task.ti: ffff88019ff5a000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RIP: 0010:[<ffffffff812d85bd>] [<ffffffff812d85bd>] memcpy+0xd/0x110 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RSP: 0018:ffff88019ff5b9a0 EFLAGS: 00010202 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RAX: ffff880124c74107 RBX: 000000000000065a RCX: 00000000000000cb Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RDX: 0000000000000002 RSI: db73880000000000 RDI: ffff880124c74107 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RBP: ffff8801200ff7b0 R08: 0000000000000761 R09: 0000000000001000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 6db6db6db6db6db7 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: R13: 0000160000000000 R14: ffff880124c74761 R15: 000000000000065a Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: FS: 00007fd825640700(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: CR2: 00007fd8327d81a3 CR3: 00000001c9b52000 CR4: 00000000000027e0 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Stack: Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ffffffffa00cb868 0000000000000000 ffff880219379800 0000000000000000 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ffff880120022a30 ffff880124c74000 ffff8801200ff6e0 ffff88010e865338 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ffffffffa00b0aab 0000000000001000 ffff8801318226d0 0000000000000761 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Call Trace: Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00cb868>] read_extent_buffer+0xc8/0x120 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00b0aab>] btrfs_get_extent+0x8db/0x980 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00c8b36>] __extent_read_full_page+0x316/0x7d0 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffffa00c9f99>] extent_readpages+0x169/0x1c0 [btrfs] Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff81129a43>] __do_page_cache_readahead+0x1b3/0x260 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff81129f17>] ondemand_readahead+0x117/0x250 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff8111fb66>] generic_file_aio_read+0x4a6/0x6f0 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff811827f7>] do_sync_read+0x67/0x90 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff81182d94>] vfs_read+0x94/0x160 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff811839e3>] SyS_pread64+0x63/0xa0 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<ffffffff815b6fff>] tracesys+0xe1/0xe6 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: [<00007fd85666dbd3>] 0x7fd85666dbd2 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: Code: fc ff ff 48 8b 43 58 48 2b 43 50 88 43 4e eb e9 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RIP [<ffffffff812d85bd>] memcpy+0xd/0x110 Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: RSP <ffff88019ff5b9a0> Jul 24 21:56:54 xps9000.cristianrodriguez.net kernel: ---[ end trace a4449cb6cdee0f9a ]---
No futher IO can be done, machine locks up, REISUB is the only escape :-|
-- Jeff Mahoney SUSE Labs
El 25/07/13 10:12, Jeff Mahoney escribió:
On 7/24/13 10:08 PM, Cristian Rodríguez wrote:
Hi:
Since moving to 3.11rc2 the following happends
This Oops is being tracked here: https://bugzilla.novell.com/show_bug.cgi?id=831374
Thanks ;) I suspect it has to do with concurrency, when you access the youtube homepage there are a lot of small still images competing to be cached on disk, only a few of them actually render before the system becomes unresponsive. -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 7/25/13 11:51 AM, Cristian Rodríguez wrote:
El 25/07/13 10:12, Jeff Mahoney escribió:
On 7/24/13 10:08 PM, Cristian Rodríguez wrote:
Hi:
Since moving to 3.11rc2 the following happends
This Oops is being tracked here: https://bugzilla.novell.com/show_bug.cgi?id=831374
Thanks ;) I suspect it has to do with concurrency, when you access the youtube homepage there are a lot of small still images competing to be cached on disk, only a few of them actually render before the system becomes unresponsive.
I'm sure it is. I've added a test user on my development node with its home directory on a btrfs file system and can't reproduce this with Chromium yet. I'm reviewing the readahead code, though, since it's different than the regular read path and there might be races interacting with extent_readpages. -Jeff -- Jeff Mahoney SUSE Labs
El 25/07/13 12:27, Jeff Mahoney escribió:
I'm sure it is. I've added a test user on my development node with its home directory on a btrfs file system and can't reproduce this with Chromium yet.
Did you toogle "Simple Cache" to "enable" in chrome://flags ? it is not enabled by default. I can reproduce consistently with that setting and chromium-30.0.1567.0-423.1.x86_64 from network:chromium / openSUSE_12.3 repository. -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 7/25/13 12:31 PM, Cristian Rodríguez wrote:
El 25/07/13 12:27, Jeff Mahoney escribió:
I'm sure it is. I've added a test user on my development node with its home directory on a btrfs file system and can't reproduce this with Chromium yet.
Did you toogle "Simple Cache" to "enable" in chrome://flags ? it is not enabled by default.
Yeah, I enabled it manually.
I can reproduce consistently with that setting and chromium-30.0.1567.0-423.1.x86_64 from network:chromium / openSUSE_12.3 repository.
I grabbed the one from Google, but it should be the same. -Jeff -- Jeff Mahoney SUSE Labs
El 25/07/13 12:27, Jeff Mahoney escribió:
I'm sure it is. I've added a test user on my development node with its home directory on a btrfs file system and can't reproduce this with Chromium yet. I'm reviewing the readahead code, though, since it's different than the regular read path and there might be races interacting with extent_readpages.
My mount options are "noatime,nobarrier,compress,discard" ...HRMM..now that I think about it, I should not be using "nobarrier" :-P -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 7/25/13 12:36 PM, Cristian Rodríguez wrote:
El 25/07/13 12:27, Jeff Mahoney escribió:
I'm sure it is. I've added a test user on my development node with its home directory on a btrfs file system and can't reproduce this with Chromium yet. I'm reviewing the readahead code, though, since it's different than the regular read path and there might be races interacting with extent_readpages.
My mount options are "noatime,nobarrier,compress,discard" ...HRMM..now that I think about it, I should not be using "nobarrier" :-P
noatime: effect is in VFS nobarrier: (this is really ok if you have write cache disabled) compress: not enabled, i don't trust it yet discard: not enabled, huge impact on performance during big unlinks Would you be willing to retest without compress? -Jeff -- Jeff Mahoney SUSE Labs
El 25/07/13 12:51, Jeff Mahoney escribió:
nobarrier: (this is really ok if you have write cache disabled)
I will keep it just in the laptop that has battery and shuts down when energy level is critical, but not in the desktop that has been unreliable lately and has no power redundancy..
Would you be willing to retest without compress?
Sure.. gonna try again after reboot. ;) -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
El 25/07/13 12:51, Jeff Mahoney escribió:
Would you be willing to retest without compress?
Ok, I left the /home mount point only with "noatime" , reboot, enabled simple cache again, tried heavy sites (or just youtube) and the issue *cannot* be reproduced again :-S -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 7/25/13 1:06 PM, Cristian Rodríguez wrote:
El 25/07/13 12:51, Jeff Mahoney escribió:
Would you be willing to retest without compress?
Ok, I left the /home mount point only with "noatime" , reboot, enabled simple cache again, tried heavy sites (or just youtube) and the issue *cannot* be reproduced again :-S
Ha. Ok. What's the flags line of your /proc/cpuinfo say? The Oops is in memcpy, which is /super/ optimized based on processor capabilities. -Jeff -- Jeff Mahoney SUSE Labs
El 25/07/13 13:32, Jeff Mahoney escribió:
What's the flags line of your /proc/cpuinfo say?
The Oops is in memcpy, which is /super/ optimized based on processor capabilities.
-Jeff
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 7/25/13 1:40 PM, Cristian Rodríguez wrote:
El 25/07/13 13:32, Jeff Mahoney escribió:
What's the flags line of your /proc/cpuinfo say?
The Oops is in memcpy, which is /super/ optimized based on processor capabilities.
-Jeff
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
Ok. You have the rep_good bit but not the erms bit, so that means the memcpy implementation is memcpy_c. It looks like the source buffer is invalid. -Jeff -- Jeff Mahoney SUSE Labs
On 7/25/13 2:03 PM, Jeff Mahoney wrote:
On 7/25/13 1:40 PM, Cristian Rodríguez wrote:
El 25/07/13 13:32, Jeff Mahoney escribió:
What's the flags line of your /proc/cpuinfo say?
The Oops is in memcpy, which is /super/ optimized based on processor capabilities.
-Jeff
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
Ok. You have the rep_good bit but not the erms bit, so that means the memcpy implementation is memcpy_c.
It looks like the source buffer is invalid.
-Jeff
Ok, I've added a BUG_ON to detect whether we're overrunning the index in extent_buffer_page(). Can you test with a kernel with this commit? commit 15eacb944c080a4757ad6634cc1363bd3705cff4 Author: Jeff Mahoney <jeffm@suse.com> Date: Thu Jul 25 16:55:37 2013 -0400 btrfs: check index in extent_buffer_page. -Jeff -- Jeff Mahoney SUSE Labs
El 25/07/13 16:59, Jeff Mahoney escribió:
Can you test with a kernel with this commit?
commit 15eacb944c080a4757ad6634cc1363bd3705cff4 Author: Jeff Mahoney <jeffm@suse.com> Date: Thu Jul 25 16:55:37 2013 -0400
btrfs: check index in extent_buffer_page.
-Jeff
To which tree was it commited ? (not found in openSUSE kernel-source at least) -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 7/25/13 9:26 PM, Cristian Rodríguez wrote:
El 25/07/13 16:59, Jeff Mahoney escribió:
Can you test with a kernel with this commit?
commit 15eacb944c080a4757ad6634cc1363bd3705cff4 Author: Jeff Mahoney <jeffm@suse.com> Date: Thu Jul 25 16:55:37 2013 -0400
btrfs: check index in extent_buffer_page.
-Jeff
To which tree was it commited ? (not found in openSUSE kernel-source at least)
Kernel:HEAD -Jeff -- Jeff Mahoney SUSE Labs
El 26/07/13 08:26, Jeff Mahoney escribió:
Kernel:HEAD
OK, installed the updated version that contains your patch, restored the old mount options that supposedly were related to this crash, tried to reproduce again, no luck..system keeps working normally, no BUG message, nothing at all...hrmmm -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 07/26/2013 02:40 PM, Cristian Rodríguez wrote:
El 26/07/13 08:26, Jeff Mahoney escribió:
Thanks for your time and attention Jeff, but I think I just figured out why this crash misteriously goes away, GCC ICEs on the machine,reboots randomly, or the usb controller sometimes goes apeshit,there is at least one broken memory chip in my desktop. :-| -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
participants (2)
-
Cristian Rodríguez
-
Jeff Mahoney