[Bug 231234] New: Sometimes desktop hangs for some time, with an nfs mounted home
https://bugzilla.novell.com/show_bug.cgi?id=231234 Summary: Sometimes desktop hangs for some time, with an nfs mounted home Product: openSUSE 10.2 Version: Final Platform: 32bit OS/Version: Linux Status: NEW Severity: Major Priority: P5 - None Component: Network AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: richard.bos@xs4all.nl QAContact: qa@suse.de Sometimes desktop hangs for some time, with an nfs mounted home this happens after a upgrade to opensuse-10.2. Before 10.2 this behaviour was not present, neither on 10.0 nor 10.1. Perhaps the problem has to do with those bugs to: Bug 227645 - NFS has upload limt of 50 KB/s https://bugzilla.novell.com/show_bug.cgi?id=227 Bug 229848 - unable to access certain web sites https://bugzilla.novell.com/show_bug.cgi?id=229848 I have not been able to debug/benchmark my nfs setup as described in bug report 227645: *Tested diffrent blocksizes (1024, 2048, 4096, 8192, 16384 and 32768) and benchnmarked them with: # mount files.first.com:/home /mnt -o rw,wsize=1024 # time dd if=/dev/zero of=/mnt/test bs=16k count=16k Every benchmark had around the 25 MB/s If I do that I get: # time dd if=/dev/zero of=/mnt/test bs=16k count=16k dd: opening `/mnt/test': Permission denied real 0m0.003s user 0m0.000s sys 0m0.004s I don't know what is the reason for that. I have a wireshark dump of 55MB, that shows some interesting output, perhaps you want to receive this? Before I attach it to the bugreport, I'll provide an ascii dump of the headers. A short dump of output follows below:
grep " 430[89]." wireshark.out | cut -c65- RPC Continuation RPC Continuation RPC Continuation TCP [TCP Window Update] nfs > 972 [ACK] Seq=2447796 Ack=87180476 Win=8500 Len=0 TSV=43593242 TSER=10607958 RPC Continuation RPC Continuation RPC Continuation RPC Continuation RPC Continuation RPC Continuation RPC Continuation TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU] TCP [TCP segment of a reassembled PDU]
On a correct working system, I think that the "RPC Continuation" and "reassembled PDU" shouldn't be there. I can't remember I saw those on systems prior to 10.2. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #1 from richard.bos@xs4all.nl 2007-01-01 15:47 MST ------- Created an attachment (id=111238) --> (https://bugzilla.novell.com/attachment.cgi?id=111238&action=view) Wireshark output during complete desktop blockage The desktop is sometimes blocked for I think 20 seconds or more. Sometimes it is short. It may happen once an hour, but than it might happen several times directly after each other. So it may also happen several times a hour. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 chrubis@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team- |kernel-maintainers@forge.provo.novell.com |screening@forge.provo.novell| |.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #2 from richard.bos@xs4all.nl 2007-01-08 13:43 MST ------- It is very noticable when at the end of an session kmenu (the new one from suse) is opened. When hoovering over the tabs, and the "quit/leave' tab is hovered it takes a long time before the entries in the that tab become accessible. Once the tab has been accessed, it remains responsive. Looks to me to that first some config file has to read, which takes some time (now over nfs). Also during start up, kontact takes a long(er) to start up (most likely reading many config files). But while kontact is slow, I can start wireshark without problems. Looks like this bug report https://bugzilla.novell.com/show_bug.cgi?id=232636 "NFS Server is slow" is related. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #3 from okir@novell.com 2007-01-09 03:15 MST ------- At least the RPC continuation packets are entirely fine. As a large read or write request doesn't fit into a single TCP segment, it gets sent out in chunks. In additional, the kernel no longer set the PSH bit on the last segment of a request, so that RPC packets no longer align with a segment start - so you end up with RPC packets starting in the middle of a TCP segment. Which is perfectly fine, except that both tcpdump and ethereal are not smart enough to deal with this. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 lmb@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |richard.bos@xs4all.nl ------- Comment #4 from lmb@novell.com 2007-01-09 05:46 MST ------- 1. What are the mount options you're using on the /home on the client? 2. What are the export options on the server? The inability to create files as root suggests root_squash, that's to be expected. 3. Are there any kernel messages on the client or on the server during your test attempts? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #5 from richard.bos@xs4all.nl 2007-01-09 12:19 MST ------- 1. What are the mount options you're using on the /home on the client? $ grep /home /etc/fstab eos:/home /home nfs I can't check what I had on the suse installations prior to suse-10.2 (like 10.0 and 10.1) as 10.2 is now on those partitions. 2. What are the export options on the server? -rw-r--r-- 1 root root 132 Oct 2 2005 /etc/exports $ grep home /etc/exports /home 192.168.4.0/255.255.255.0(rw,root_squash,sync) $ uname -a Linux eos 2.6.11.4-21.9-default #1 Fri Aug 19 11:58:59 UTC 2005 i686 i686 i386 GNU/Linux
The inability to create files as root suggests root_squash, that's to be expected. Indeed, thanks for pointing this out!
As you can see the server did not change. It's the client that has changed (to opensuse-10.2). 3. Are there any kernel messages on the client or on the server during your test attempts? No, /var/log/messages is just quite (with the standard settings). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 lmb@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|richard.bos@xs4all.nl | Resolution| |INVALID ------- Comment #6 from lmb@novell.com 2007-01-09 12:28 MST ------- The filesystem is mounted "sync". That's bound to cause abyssimal performance. You're specifying "sync" on the server as an export option. Don't do that, if you don't want that. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #7 from richard.bos@xs4all.nl 2007-01-10 15:24 MST ------- I removed the sync options, but the desktop still hangs sometimes. Is it necessary to specify the "async" options, or is that default? Hmm, after reading the 'man exports' page, it seems that async has to be configured explicitly (sync is default). I'll add async to the exports file than.... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 richard.bos@xs4all.nl changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | ------- Comment #8 from richard.bos@xs4all.nl 2007-01-11 11:41 MST ------- The problem is still there! Hence the bug report should not be closed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 lmb@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel- |nfbrown@novell.com |maintainers@forge.provo.nove| |ll.com | Status|REOPENED |NEW -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |richard.bos@xs4all.nl ------- Comment #9 from nfbrown@novell.com 2007-01-14 14:38 MST ------- Re comment #6 - the filesystem isn't mounted 'sync', it is exported 'sync' which is the default and preferred way to export. It does cause some performance loss, but not nearly as much as mounting 'sync'. The most interesting artifact in the wireshare trace is that some file is being written to throughout the whole trace. Around 76Meg gets written over a 40 second period. Unfortunately I cannot tell which file or what is being written. I suspect this could be related to the slowdown somehow. A raw tcpdump gathered with tcpdump -w /tmp/dump -s0 host CLIENT and host SERVER might provide more useful information. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #10 from richard.bos@xs4all.nl 2007-01-15 14:01 MST ------- Hi Neil, thanks for your support. I'll try to execute the command (I have it prepared, it is ready to be executed), but it is always difficult to catch the event, as the desktop is hanging. When that happens it is not possible anymore to execute anything. Perhaps from remote, but I haven't tried that yet... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #11 from richard.bos@xs4all.nl 2007-01-17 15:53 MST ------- Hello Neil, I think that I catched a hangup (this time was only kontact, that did not react. The rest of the desktop was quite okay). However, the trace is about 227M (30 minutes of data). When I grep the file on ERR I get this: # tcpdump -r /tmp/dump | grep "ERR " | grep -c 23:15: reading from file /tmp/dump, link-type EN10MB (Ethernet) 12405 copy and paste of a small part of the dump: 23:15:19.852726 IP www.radoeka.nl.nfs > med102.radoeka.nl.1953328128: reply ERR 1448 23:15:19.852851 IP www.radoeka.nl.nfs > med102.radoeka.nl.1932422764: reply ERR 1448 23:15:19.852921 IP www.radoeka.nl.nfs > med102.radoeka.nl.1915699043: reply ERR 836 23:15:19.853046 IP www.radoeka.nl.nfs > med102.radoeka.nl.1949249840: reply ERR 1448 23:15:19.853168 IP www.radoeka.nl.nfs > med102.radoeka.nl.1701540724: reply ERR 1448 23:15:19.853271 IP www.radoeka.nl.nfs > med102.radoeka.nl.1936994097: reply ERR 1200 23:15:19.853519 IP www.radoeka.nl.nfs > med102.radoeka.nl.1953328128: reply ERR 1448 23:15:19.853640 IP www.radoeka.nl.nfs > med102.radoeka.nl.1949249842: reply ERR 1448 23:15:19.853766 IP www.radoeka.nl.nfs > med102.radoeka.nl.1936993902: reply ERR 1448 23:15:19.853887 IP www.radoeka.nl.nfs > med102.radoeka.nl.1835794536: reply ERR 1448 23:15:19.853971 IP www.radoeka.nl.nfs > med102.radoeka.nl.1852583796: reply ERR 952 23:15:19.854094 IP www.radoeka.nl.nfs > med102.radoeka.nl.4294967040: reply ERR 1448 23:15:19.854218 IP www.radoeka.nl.nfs > med102.radoeka.nl.778988591: reply ERR 1448 23:15:19.854321 IP www.radoeka.nl.nfs > med102.radoeka.nl.1953511983: reply ERR 1200 23:15:19.854566 IP www.radoeka.nl.nfs > med102.radoeka.nl.1979727104: reply ERR 1448 23:15:19.854672 IP www.radoeka.nl.nfs > med102.radoeka.nl.6845556: reply ERR 1200 23:15:22.004620 IP www.radoeka.nl.nfs > med102.radoeka.nl.6845556: reply ERR 908 23:15:22.033312 IP www.radoeka.nl.nfs > med102.radoeka.nl.1999532140: reply ERR 1448 23:15:22.033410 IP www.radoeka.nl.nfs > med102.radoeka.nl.778268525: reply ERR 1152 If I compress the dump it is 65M. The file can be downloaded from: http://linux01.gwdg.de/apt4rpm/dump.bz2 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #12 from richard.bos@xs4all.nl 2007-01-21 13:16 MST ------- I had a full desktop hangup recently. During the hangup tcpdump was running. The hangup started when I clicked on a link to intevation.de in an email. See below for a small part of the dump around that time. The full tcpdump can be obtained from http://linux01.gwdg.de/apt4rpm/dump-070119.bz2 (55MB). There are a lot of fsstat fh Unknown messages, are these the cause? 22:16:10.288478 IP med102.radoeka.nl.3836003529 > www.radoeka.nl.nfs: 136 commit fh Unknown/0100000600FD000602000000174C0500A70D0000BFD169 00A70D0000700C0000 0 bytes @ 0 22:16:10.288763 IP www.radoeka.nl.nfs > med102.radoeka.nl.3836003529: reply ok 132 commit 22:16:10.288924 IP med102.radoeka.nl.3852780745 > www.radoeka.nl.nfs: 208 rename fh Unknown/0100000600FD000602000000A70D0000700C0000336600 00B30C0000700C0000 "konq_historyeKQibc.new" -> fh Unknown/0100000600FD000602000000A70D0000700C000033660000B30C0000700C0000 "konq_history" 22:16:10.290079 IP www.radoeka.nl.nfs > med102.radoeka.nl.3852780745: reply ok 264 rename .......... 22:16:24.623840 IP med102.radoeka.nl.36963 > www.radoeka.nl.domain: 48973+ AAAA? intevation.de.radoeka.nl. (42) 22:16:24.624671 IP www.radoeka.nl.domain > med102.radoeka.nl.36963: 48973 NXDomain* 0/1/0 (87) 22:16:24.693876 IP www.radoeka.nl.domain > med102.radoeka.nl.36962: 2449 2/7/8 CNAME 134.128/29.122.95.212.in-addr.arpa., PTR doto.inteva tion.org. (443) 22:16:27.004167 IP med102.radoeka.nl.2057684169 > www.radoeka.nl.nfs: 100 fsstat fh Unknown/0100000000FD00060200000052494E034E455400C01700 0200010000C7EC000C 22:16:27.004542 IP www.radoeka.nl.nfs > med102.radoeka.nl.2057684169: reply ok 88 fsstat tbytes 42948358144 fbytes 16775835648 abytes 1677 5835648 Sort summary: - After upgrading suse-10.1 and suse-10.0 to opensuse-10.2 (2 different systems) The kde desktop hangs randomly, for about 20 seconds (I guess it might be more). The nfs server on suse-9.3 is nfs-utils-1.0.7-3, the /home file system is exported with async. The problem started with opensuse-10.2 As the problem is seen on 2 different systems I don't that the network card or driver (2 different ones) are the cause of the problem . In suse-10.0 and 10.1 nfs-utils has version 1.0.7 The nfsserver at opensuse-10.2 is: nfs-utils-1.0.10-22 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 richard.bos@xs4all.nl changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|richard.bos@xs4all.nl | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #13 from richard.bos@xs4all.nl 2007-01-22 13:58 MST ------- Would it make sense to install the 10.1 nfs-utils rpm on opensuse-10.2? I don't think that it is easily doable if I look at the depencies: # rpm --test -Uvh nfs-utils-1.0.7-36.i586.rpm error: Failed dependencies: libgssapi.so.1 is needed by nfs-utils-1.0.7-36.i586 librpcsecgss.so.1 is needed by nfs-utils-1.0.7-36.i586 # locate libgssapi.so. /usr/lib/libgssapi.so.2 # locate librpcsecgss.so. /usr/lib/librpcsecgss.so.3 Perhaps I should compile the 10.1 src rpm on opensuse-10.2? What's your opinion? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #14 from richard.bos@xs4all.nl 2007-01-22 15:19 MST ------- I just finished the build of nfs-utils-1.0.7 (from suse-10.1) on my opensuse-10.2 system. Would it be usefull to replace the nfs-utils-1.0.10 with the older one, and whether the problem disappears? The rpm seems installable: # rpm -Uvh --test /usr/tmp/nfs/RPMS/i586/nfs-utils-1.0.7-36.i586.rpm Preparing... ########################################### [100%] package nfs-utils-1.0.10-22 (which is newer than nfs-utils-1.0.7-36) is already installed file /etc/gssapi_mech.conf from install of nfs-utils-1.0.7-36 conflicts with file from package libgssapi-0.10-22 # /usr/tmp/nfs/RPMS/i586/files/etc> diff gssapi_mech.conf /etc/ 0a1,2
# Example /etc/gssapi_mech.conf file # 16c18 < /usr/lib/libgssapi_krb5.so mechglue_internal_krb5_init
/usr/lib/libgssapi_krb5.so mechglue_internal_krb5_init
-- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |richard.bos@xs4all.nl ------- Comment #15 from nfbrown@novell.com 2007-01-23 18:48 MST ------- I doubt a change in nfs-utils would have any effect on performance. The trace in comment #12 is different. It doesn't show a large background write like the other trace. It doesn't really show anything interesting that I can see at all - unfortunately. There is no evidence of NFS requests taking a long time, or of an excessive number of requests at this time. How long was the hang for this time? Why do you think this might be NFS related? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #16 from richard.bos@xs4all.nl 2007-01-24 15:10 MST -------
I doubt a change in nfs-utils would have any effect on performance.
I changed nfs-utils to the 10.1 version today, before I logged in.
The trace in comment #12 is different. It doesn't show a large background write like the other trace. It doesn't really show anything interesting that I can see at all - unfortunately. There is no evidence of NFS requests taking a long time, or of an excessive number of requests at this time.
How long was the hang for this time?
Don't know. I think 30 seconds up to a minute or so. The thing it, it just breaks my work. I just want to finish something, and bang desktop hangs :(( Annoying, very annoying.
Why do you think this might be NFS related? Experience, god feeling. I know that kde has problem with nfs in the past.
And you know what, after changing back to nfs-utils-1.0.7 (suse-10.1), it looks like there are no hangs.... Let's leave it like this for a while and let's see how the desktop behaves in the coming days! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #17 from richard.bos@xs4all.nl 2007-02-13 13:41 MST ------- The desktops hangs with the older nfs (1.0.7) as well. So, it most likely has nothing to do with the new nfs (1.0.10) package. Now, what else is causing those desktop hangs. Lately I'm switching off beagle and it looks like, the desktop hangup are less... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 ------- Comment #18 from nfbrown@novell.com 2007-02-25 23:11 MST ------- I wouldn't be surprised if beagle over NFS created some unpleasant load issues. Any more new? Has removing beagle made a distinct improvement? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 richard.bos@xs4all.nl changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|richard.bos@xs4all.nl | ------- Comment #19 from richard.bos@xs4all.nl 2007-02-26 10:37 MST ------- Hi Neil, good of you asking for the status. Since I switched of beagle the hangup desktop is gone. So it's not NFS that is causing the problem, as this is the main subject of this ticket, the ticket can be closed. Do you happen whether there is already a ticket open for beagle causing desktops to freeze? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231234 nfbrown@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID ------- Comment #20 from nfbrown@novell.com 2007-03-11 17:10 MST ------- No, I don't know of any open ticket about beagle. I suspect a search might find something. Anyway, I'll close this one, thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com