On Sunday 06 December 2009 18:24:13 and regarding:
On Sunday, 2009-12-06 at 17:37 -0600, David C. Rankin wrote:
There is a problem with 11.2 x86_64 that causes disk corruption on
USB and Remote filesystems when you connect to them.
What type of filesystem, and what type of remote connection?
ext2
Dec 3 18:05:42 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 3 18:05:42 nirvana kernel: ata3.00: BMDMA stat 0x25
Dec 3 18:05:42 nirvana kernel: ata3.00: cmd 25/00:08:33:0c:8c/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in
Dec 3 18:05:42 nirvana kernel: res 51/40:00:39:0c:8c/40:00:34:00:00/e0 Emask 0x9 (media error)
"Media error" sounds to be a hardware error.
Yep, that's what I thought, but it isn't. Like I said, I do NOT understand it, but from the language/encoding weirdness I saw (and still see on my 11.2 x86_64 laptop) I would be willing to bet that there is a problem with the remote execution of filesystem commands that somehow garbles the remote filesystem. I have spent a lot of time on this and I have done a great deal of comparison between connecting remotely with XP, 10.3, 11.0, Arch Linux, and the 11.2 laptop and the only box that causes corruption over usb or remote ssh/smb/fish/sftp is the 11.2 laptop. (I'm using the same laptop right now (I have 3 hard drives for it) running Arch Linux and it is working perfectly). I also have 10.3 on the 3rd drive and both Arch Linux, 10.3 and 11.0 (on the drive before 11.2) work perfectly with usb and remotely with 0 zero errors or corruption. Connection from this laptop with 11.2 is the only instance where I have consistent corruption issues.
Also, (I don't see how this could be related, but you never know), 11.2 x86_64 has terrible video driver issues on this box. Wild guess, but maybe there is a hardware error occurring related to a xorg/mesa or drm call that is affecting the remote communication??? (like I said very wild guess). However, after the 11.2 drm update, X is dead on 11.2. The only way I can get x to run is to explicitly use the radeon "Device" in xorg.conf and then the video will make you sea-sick. Enlightenment E16 will not run (you can start it, but all decorations are gone and backgrounds are blank) Compiz whitescreens (compositing failure) and desktop effects can NOT be enabled in kde4. Strangely compiz works fine in gnome... (I don't think it is related, but you guys that know more about video interrupt interaction with the other hardware busses will have to ponder the thought)
Dec 3 18:06:30 nirvana kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
And that is indeed a hardware error, a really bad one. End of life for
hard disk, typically.
Again, that is what I thought, but it was just corruption caused by the remote connection. Here are the logs since the last repair and 300G of file transfers:
Here is the sequence and history of the errors beginning with the connection from my 11.2 x86_64 box. The full logs from __ are here:
The first error occurs withinn 20 minutes of my first connection from my 11.2 laptop and su to root. I had installed on 11/17 at approximately 00:00 UTC (18:00 on the 16th localtime) The server is nirvana, my laptop is alchemy connecting at 192.168.6.102. I have marked the specific events with '---' at the start:
---
Nov 17 03:08:41 nirvana sshd[22185]: Accepted publickey for david from 192.168.6.102 port 59448 ssh2
Nov 17 03:08:46 nirvana su: (to root) david on /dev/pts/3
Nov 17 03:10:01 nirvana /usr/sbin/cron[23394]: (david) CMD (/home/david/linux/scripts/Learn_as_spam_cron)
Nov 17 03:12:01 nirvana /usr/sbin/cron[23414]: (drr) CMD (/usr/local/bin/Learn_as_spam_cron)
Nov 17 03:14:01 nirvana /usr/sbin/cron[23430]: (deborah) CMD (/usr/local/bin/Learn_as_spam_cron)
Nov 17 03:16:01 nirvana /usr/sbin/cron[23489]: (zachry) CMD (/usr/local/bin/Learn_as_spam_cron)
Nov 17 03:18:01 nirvana /usr/sbin/cron[24289]: (sydney) CMD (/usr/local/bin/Learn_as_spam_cron)
Nov 17 03:18:29 nirvana dhcpd: Wrote 0 deleted host decls to leases file.
Nov 17 03:18:29 nirvana dhcpd: Wrote 0 new dynamic host decls to leases file.
Nov 17 03:18:29 nirvana dhcpd: Wrote 38 leases to leases file.
Nov 17 03:20:00 nirvana dhcpd: DHCPREQUEST for 192.168.6.120 from 00:25:00:df:fe:2c (iPod-touch-2) via eth0
Nov 17 03:20:00 nirvana dhcpd: DHCPACK on 192.168.6.120 to 00:25:00:df:fe:2c (iPod-touch-2) via eth0
Nov 17 03:21:26 nirvana dhcpd: DHCPREQUEST for 192.168.6.120 from 00:25:00:df:fe:2c (iPod-touch-2) via eth0
Nov 17 03:21:26 nirvana dhcpd: DHCPACK on 192.168.6.120 to 00:25:00:df:fe:2c (iPod-touch-2) via eth0
Nov 17 03:24:11 nirvana dhcpd: DHCPREQUEST for 192.168.6.120 from 00:25:00:df:fe:2c (iPod-touch-2) via eth0
Nov 17 03:24:11 nirvana dhcpd: DHCPACK on 192.168.6.120 to 00:25:00:df:fe:2c (iPod-touch-2) via eth0
Nov 17 03:24:11 nirvana dhcpd: DHCPDISCOVER from 00:25:00:df:fe:2c (iPod-touch-2) via eth0
Nov 17 03:24:12 nirvana dhcpd: DHCPOFFER on 192.168.6.120 to 00:25:00:df:fe:2c (iPod-touch-2) via eth0
---
Nov 17 03:28:04 nirvana shadow[24998]: group already exists - group=ntadmin, by=0
Nov 17 03:28:04 nirvana dbus-daemon: Unable to reload configuration: Element not allowed inside <busconfig> in configuration file
Nov 17 03:28:04 nirvana avahi-daemon[3764]: Disconnected from D-Bus, exiting.
Nov 17 03:28:04 nirvana avahi-daemon[3764]: Got SIGQUIT, quitting.
Nov 17 03:28:04 nirvana powersaved[4644]: ERROR (filter_function:98) DBus daemon disconnected. Trying to reconnect...
Nov 17 03:28:04 nirvana avahi-daemon[3764]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.6.17.
Nov 17 03:28:04 nirvana avahi-dnsconfd[3775]: read(): EOF
Nov 17 03:28:07 nirvana console-kit-daemon[2480]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
Nov 17 03:28:21 nirvana syslog-ng[2458]: last message repeated 4 times
Nov 17 03:28:21 nirvana gconfd (david-1390): GConf server is not in use, shutting down.
Nov 17 03:28:21 nirvana gconfd (david-1390): Error releasing lockfile: Failed to link '/tmp/gconfd-david/lock/1t1258450101ut827002u1000p1390r552977190k2414831832' to '/tmp/gconfd-david/lock/ior': No such file or directory
Nov 17 03:28:21 nirvana gconfd (david-1390): Exiting
Nov 17 03:28:22 nirvana console-kit-daemon[2480]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
Nov 17 03:29:07 nirvana syslog-ng[2458]: last message repeated 15 times
Nov 17 03:29:10 nirvana console-kit-daemon[2480]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
Nov 17 03:30:07 nirvana syslog-ng[2458]: last message repeated 18 times
Nov 17 03:30:07 nirvana console-kit-daemon[2480]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
Nov 17 03:31:07 nirvana syslog-ng[2458]: last message repeated 20 times
Nov 17 03:31:10 nirvana console-kit-daemon[2480]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
<snip>
Basically, it looks like the connection for 11.2 fried console-kit and dbus. I have never seen the error Element before??? Thing rock along with dbus dead until I check the logs and resart console-kit:
Nov 21 21:28:12 nirvana syslog-ng[2458]: last message repeated 19 times
Nov 21 21:28:15 nirvana console-kit-daemon[2480]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
Nov 21 21:28:34 nirvana syslog-ng[2458]: last message repeated 6 times
Nov 21 21:28:34 nirvana sudo: david : TTY=pts/3 ; PWD=/home/david ; USER=root ; COMMAND=/usr/sbin/rcconsolekit restart
Nov 21 21:28:34 nirvana console-kit-daemon[8261]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
Nov 21 21:28:39 nirvana sudo: david : TTY=pts/3 ; PWD=/home/david ; USER=root ; COMMAND=/usr/bin/tail -n250 /var/log/messages
Nov 21 21:32:53 nirvana dhcpd: Wrote 0 deleted host decls to leases file.
Nov 21 21:32:53 nirvana dhcpd: Wrote 0 new dynamic host decls to leases file.
Nov 21 21:32:53 nirvana dhcpd: Wrote 38 leases to leases file.
Nov 21 21:32:53 nirvana dhcpd: DHCPREQUEST for 192.168.6.118 from 00:11:43:22:4f:23 (dcrgx) via eth0
Nov 21 21:32:53 nirvana dhcpd: DHCPACK on 192.168.6.118 to 00:11:43:22:4f:23 (dcrgx) via eth0
<snip>
dbus/console-kit are happy again and things rock along for another 12 days or so until all hell breaks loose with a su: (to nobody) after connection from 11.2 on my i586 box:
Dec 2 04:18:01 nirvana /usr/sbin/cron[30645]: (sydney) CMD (/usr/local/bin/Learn_as_spam_cron)
Dec 2 04:18:16 nirvana sshd[30655]: Accepted publickey for david from 192.168.6.101 port 40183 ssh2
Dec 2 04:20:56 nirvana sshd[30702]: Accepted publickey for david from 192.168.6.101 port 59805 ssh2
Dec 2 04:30:08 nirvana dhcpd: DHCPREQUEST for 192.168.6.101 from 00:11:f5:15:2d:83 via eth0
Dec 2 04:30:08 nirvana dhcpd: DHCPACK on 192.168.6.101 to 00:11:f5:15:2d:83 via eth0
Dec 2 05:00:14 nirvana syslog-ng[2458]: STATS: dropped 0
Dec 2 05:01:16 nirvana su: (to nobody) root on none
Dec 2 05:01:25 nirvana syslog-ng[2458]: last message repeated 3 times
Dec 2 05:02:48 nirvana su: (to nobody) root on none
Dec 2 05:03:26 nirvana syslog-ng[2458]: last message repeated 2 times
Dec 2 05:05:01 nirvana /usr/sbin/cron[20130]: (root) CMD (/root/.local/rcron)
Dec 2 05:05:01 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 2 05:05:01 nirvana kernel: ata3.00: BMDMA stat 0x25
Dec 2 05:05:01 nirvana kernel: ata3.00: cmd 25/00:08:93:19:8b/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in
Dec 2 05:05:01 nirvana kernel: res 51/40:00:99:19:8b/40:00:34:00:00/e0 Emask 0x9 (media error)
Dec 2 05:05:01 nirvana kernel: ata3.00: configured for UDMA/133
Dec 2 05:05:01 nirvana kernel: ata3: EH complete
Dec 2 05:05:04 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 2 05:05:04 nirvana kernel: ata3.00: BMDMA stat 0x25
Dec 2 05:05:04 nirvana kernel: ata3.00: cmd 25/00:08:93:19:8b/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in
Dec 2 05:05:04 nirvana kernel: res 51/40:00:99:19:8b/40:00:34:00:00/e0 Emask 0x9 (media error)
Dec 2 05:05:05 nirvana kernel: ata3.00: configured for UDMA/133
Dec 2 05:05:05 nirvana kernel: ata3: EH complete
Dec 2 05:05:08 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 2 05:05:08 nirvana kernel: ata3.00: BMDMA stat 0x25
Dec 2 05:05:08 nirvana kernel: ata3.00: cmd 25/00:08:93:19:8b/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in
Dec 2 05:05:08 nirvana kernel: res 51/40:00:99:19:8b/40:00:34:00:00/e0 Emask 0x9 (media error)
Dec 2 05:05:20 nirvana kernel: ata3.00: configured for UDMA/133
Dec 2 05:05:20 nirvana kernel: ata3: EH complete
Dec 2 05:05:20 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 2 05:05:20 nirvana kernel: ata3.00: BMDMA stat 0x25
Dec 2 05:05:20 nirvana kernel: ata3.00: cmd 25/00:08:93:19:8b/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in
Dec 2 05:05:20 nirvana kernel: res 51/40:00:99:19:8b/40:00:34:00:00/e0 Emask 0x9 (media error)
Dec 2 05:05:20 nirvana kernel: ata3.00: configured for UDMA/133
Dec 2 05:05:20 nirvana kernel: ata3: EH complete
Dec 2 05:05:20 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 2 05:05:20 nirvana kernel: ata3.00: BMDMA stat 0x25
Dec 2 05:05:20 nirvana kernel: ata3.00: cmd 25/00:08:93:19:8b/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in
Dec 2 05:05:20 nirvana kernel: res 51/40:00:99:19:8b/40:00:34:00:00/e0 Emask 0x9 (media error)
Dec 2 05:05:20 nirvana kernel: ata3.00: configured for UDMA/133
Dec 2 05:05:20 nirvana kernel: ata3: EH complete
Dec 2 05:05:20 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 2 05:05:20 nirvana kernel: ata3.00: BMDMA stat 0x25
Dec 2 05:05:20 nirvana kernel: ata3.00: cmd 25/00:08:93:19:8b/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in
Dec 2 05:05:20 nirvana kernel: res 51/40:00:99:19:8b/40:00:34:00:00/e0 Emask 0x9 (media error)
Dec 2 05:05:20 nirvana kernel: ata3.00: configured for UDMA/133
Dec 2 05:05:20 nirvana kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 2 05:05:20 nirvana kernel: sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Dec 2 05:05:20 nirvana kernel: Descriptor sense data with sense descriptors (in hex):
Dec 2 05:05:20 nirvana kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 2 05:05:20 nirvana kernel: 34 8b 19 99
Dec 2 05:05:20 nirvana kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Dec 2 05:05:20 nirvana kernel: end_request: I/O error, dev sda, sector 881531289
Dec 2 05:05:20 nirvana kernel: ata3: EH complete
Dec 2 05:05:20 nirvana kernel: EXT3-fs error (device dm-5): ext3_get_inode_loc: unable to read inode block - inode=50578131, block=101154840
Dec 2 05:05:20 nirvana kernel: sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
This look like an sda hardware error, but it isn't. This error was triggered when my other 11.2 i586 box attempted to connect (192.168.6.101). The disk basically exploded. I am willing to bet the connection from the i586 box was an attempt to connect from a dbus based app (remember the first console-kit/dbus crash) using fish or sftp which is NOT supported remotely, but instead of handling the error properly the 10.3 server (probably not all that dbus aware) tried to do something screwy that resulted in the disk error. What the issue may be in 11.2 is in the way it now handles remote filesystem calls from dbus apps that isn't doing what it should. I have inadvertently typed fish://servername or sftp://servername as save locations hundreds of times from 11.0 and from Arch Linux and never had any adverse effect. I wonder if 11.2 is handling this differently??
After the call from 192.168.6.101, this disk error continued until I figured out how to repair of the dmraid /home partitions (simple, but you have to boot to the rescue disk and then activate the raid sets before running fsck on the raid devices instead of the individual disks). Since the rescue, there are NO (zero) disk errors reported under heavy test load (300+ Gigabytes) as all the backup files are copied back to the server and moved around for good measure.
I think there is something here, but I cannot diagnose it further with my knowledge of dbus or filesystems. We will just have to keep an eye on it for now. If you want the complete syslog, I can send it to you off-list. (nothing special, there are just client fax numbers in it from hylafax that I just can't post to the list. If I provide it to you as my agent for the purposes of diagnostics, you (and anyone else) can freely use it within Novell/openSuSE and provide it to any specialists you may want to look at it as long as you advise them that it potentially contains privileged information and not to redisclose the #'s)(not a biggie, I just want to dot the i's and cross the t's)
Let me know.
--
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com
--
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org