https://bugzilla.novell.com/show_bug.cgi?id=415640
User jimc@math.ucla.edu added comment
https://bugzilla.novell.com/show_bug.cgi?id=415640#c2
--- Comment #2 from James Carter 2008-08-11 12:59:48 MDT ---
As a negative test case :-( I copied the locking code (changing it as little as
possible) and called it in a simulation of the failing app, manipulating the
timing to circulate among the cases of reader has to wait, writer has to wait,
neither has to wait. It ran for about 4000 lock-unlock cycles on a
uniprocessor machine, and then for 17211 cycles (all weekend) on a server with
exactly the same configuration as the one where the failures happened (dual
Xeon with hyperthread, etc.) Locking occurred perfectly; no spurious locks by
kblockd or anything else, except when I locked the file to see if the test
program would detect it (which it did). Note, not much else was happening on
this server at the time. I was hoping to have more lock-unlock cycles but
Perl's Time::HiRes alarm() won't interrupt the fcntl system call, while the
standard one will (with a minimum sleep of 1 sec).
I've asked all the helpdesk people, if they see this failure again, to save
/proc/locks and output of "ps", and I'll post it.
I also went through the source and found lots of queue and sysfs locks but
nothing that seemed relevant to file locking. We have plenty of kernel modules
loaded; I'm posting the list. On this machine all of them are unhacked from
the distro, kernel-default-2.6.22.18-0.2.
Hunting this bug is like trying to sic your guard dog on a ghost: the jaws
close on thin air. Sorry.
--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.