https://bugzilla.novell.com/show_bug.cgi?id=697699
https://bugzilla.novell.com/show_bug.cgi?id=697699#c2
--- Comment #2 from Petr Tesařík 2011-06-02 17:21:39 UTC ---
I've just experienced another such corruption. The corrupted file was
/usr/lib/libplc4.so, and drop_caches didn't help, because the corrupted page
was still mapped into the address space of several processes (NetworkManager,
openvpn, pidgin and ekiga).
I ran debugfs on /dev/sda6 to find the on-disk location of /usr/lib/libplc4so:
azariah:~ # debugfs /dev/sda6
debugfs 1.41.14 (22-Dec-2010)
debugfs: stat /usr/lib/libnspr4.so
Inode: 1452959 Type: regular Mode: 0755 Flags: 0x0
Generation: 2481375563 Version: 0x00000000
User: 0 Group: 0 Size: 17992
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 40
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4dc0f174 -- Wed May 4 08:25:56 2011
atime: 0x4de73193 -- Thu Jun 2 08:45:39 2011
mtime: 0x4d642eb8 -- Tue Feb 22 22:46:32 2011
Size of extra inode fields: 4
BLOCKS:
(0-4):5814738-5814742
TOTAL: 5
The /dev/sda6 filesystem has 4K blocks, so the first block translates to
5814738*4096, or 0x58b9d2000. I ran hed on /dev/sda6 and saw:
58b9d2000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 :ELF:::.........
58b9d2010 03 00 03 00 01 00 00 00 90 10 00 00 34 00 00 00 :.:.:...::..4...
58b9d2020 e8 41 00 00 00 00 00 00 34 00 20 00 07 00 28 00 :A......4. .:.(.
58b9d2030 1c 00 1b 00 01 00 00 00 00 00 00 00 00 00 00 00 :.:.:...........
Then I looked up where the file was mapped to the process 3569's address space
using /proc/3569/smaps:
b5e47000-b5e4b000 r-xp 00000000 08:06 1452959 /usr/lib/libplc4.so
I could see that the in-memory copy is corrupted with gdb /proc/3569/exe 3569:
(gdb) x/32x 0xb5e47000
0xb5e47000: 0x00000000 0x00000000 0x00000000 0x00000000
0xb5e47010: 0x00000000 0x00000000 0x00000000 0x00000000
0xb5e47020: 0x000041e8 0x00000000 0x00200034 0x00280007
..
Next, I ran crash to find out where the page was located in physical memory:
crash> vtop 0xb5e47000
VIRTUAL PHYSICAL
b5e47000 7ab3a000
PAGE DIRECTORY: f2aac000
PGD: f2aacb5c => 7d1c4067
PMD: f2aacb5c => 7d1c4067
PTE: 7d1c491c => 7ab3a005
PAGE: 7ab3a000
PTE PHYSICAL FLAGS
7ab3a005 7ab3a000 (PRESENT|USER)
VMA START END FLAGS FILE
f2b388b4 b5e47000 b5e4b000 8000075 /usr/lib/libplc4.so
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
f6900740 7ab3a000 f4229ee8 0 4 8004002c
I then looked up the physical memory offset (using /dev/crash, which is similar
to /dev/mem but can map high memory), and indeed, the page is corrupted:
007ab3a000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...............
007ab3a010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...............
007ab3a020 e8 41 00 00 00 00 00 00 34 00 20 00 07 00 28 00 :A......4. .:.(.
007ab3a030 1c 00 1b 00 01 00 00 00 00 00 00 00 00 00 00 00 :.:.:..........
..
Last, I killed all processes using /usr/lib/libplc4.so, dropped the caches, and
got the correct file contents with hed /usr/lib/libplc4.so:
0000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 :ELF:::.........
0010 03 00 03 00 01 00 00 00 90 10 00 00 34 00 00 00 :.:.:...::..4...
0020 e8 41 00 00 00 00 00 00 34 00 20 00 07 00 28 00 :A......4. .:.(.
0030 1c 00 1b 00 01 00 00 00 00 00 00 00 00 00 00 00 :.:.:...........
..
--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.