[opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups
Hi guys, How are you keeping ? I have three Opensuse 10.2 x86_64 installations running on separate HP-ML370G4/5 servers (Dual Xeon). Both systems have 4gigs of RAM each. On these systems I'm running Asterisk which provides telephony for a 100 seat call center. The conversations that are recorded by Asterisk are recorded to /var/spool/asterisk/monitor which is a tmpfs/shm file system and configured as follows in my /etc/fstab: shm /var/spool/asterisk/monitor tmpfs defaults 0 0 On a daily basis each of these servers will 'dead lock' i.e. completely freeze. When this happens the servers do not respond to the keyboard or even 'ping'. I've also experienced the same problem on a totally different non HP custom built server running the same OS (OpenSuse 10.2). If I disable the RAM disk and record the conversation straight to hard disk then everything is fine - the servers do not lock up. Unfortunately I have to use the RAM disk due to performance issues with writing the recordings straight to disk. I have a script that runs every minute that moves the completed recordings from RAM disk to hard disk, so it's not getting too full. Any ideas where my problem could lie or how I can enable some type of RAM disk debugging to find out where things are going wrong ? Thanks in advance ! Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 03 April 2007, David Wilson wrote:
If I disable the RAM disk and record the conversation straight to hard disk then everything is fine - the servers do not lock up. Unfortunately I have to use the RAM disk due to performance issues with writing the recordings straight to disk.
It would seem that a deadlock would itself qualify as a performance issue. What is the underlying file structure? Could you not choose a different one with better performance? How about splitting the disk up so that not all recordings contend for the same disks/controllers? When you write from ramdisk, you incur another massive demand for memory as the fast ramdisk dumps onto the file system infrastructure which, of course can't keep up and therefore it starts demanding memory for buffers, etc. How much memory have you reserved for that? Have you looked into a raw file system for disk writes? (There was a time when raw file systems were common, but disk performance has moved beyond that now for the most part.) Have you looked at offloading disk writes to other machines via giga-bit ethernet or fiber? -- _____________________________________ John Andersen -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hi John, Thanks for your reply. Basically there is no real structure under /var/spool/asterisk/monitor - it's one directory. While 60 or so agents are on calls there will be 60 .wav file being written simultaneously to the RAM disk. The filesystem was ext3 but of course we are now using tmpfs. The recorded .wav files are moved out of RAM disk to the physical disk every minute - when this happens, the process only takes a second or two to write the files from RAM to the physical disk. Normally there is only about 80 or so MBs at a time.
Have you looked into a raw file system for disk writes? (There was a time when raw file systems were common, but disk performance has moved beyond that now for the most part.) I haven't look at this but perhaps it's an option ? Most people out there that run Asterisk call centers of this capacity and use call recording seem to have great success with using a RAM disk.
The problem I'm picking up right now is that Asterisk's active call audio quality degrades when we write to the hard disk directly and not to RAM disk first.
From what I can see on the web a RAM disk is the best way to go for this - do you know if there's a way of picking up what caused the deadlock ? I've looked through /var/log/messages but there's no mention of anything with regards to the lockup.
Thanks for your help so far. Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: John Andersen [mailto:jsa@pen.homeip.net] Sent: 03 April 2007 10:58 AM To: opensuse@opensuse.org Subject: Re: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups On Tuesday 03 April 2007, David Wilson wrote:
If I disable the RAM disk and record the conversation straight to hard disk then everything is fine - the servers do not lock up. Unfortunately I have to use the RAM disk due to performance issues with writing the recordings straight to disk.
It would seem that a deadlock would itself qualify as a performance issue. What is the underlying file structure? Could you not choose a different one with better performance? How about splitting the disk up so that not all recordings contend for the same disks/controllers? When you write from ramdisk, you incur another massive demand for memory as the fast ramdisk dumps onto the file system infrastructure which, of course can't keep up and therefore it starts demanding memory for buffers, etc. How much memory have you reserved for that? Have you looked into a raw file system for disk writes? (There was a time when raw file systems were common, but disk performance has moved beyond that now for the most part.) Have you looked at offloading disk writes to other machines via giga-bit ethernet or fiber? -- _____________________________________ John Andersen -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 03 April 2007, David Wilson wrote:
do you know if there's a way of picking up what caused the deadlock ? I've looked through /var/log/messages but there's no mention of anything with regards to the lockup.
Well I trimmed too much and do not have reference to your hardware any longer, but it seems to me that it was an SMP setup, No? There is currently a huge thread on the VMware Forum about freezes on Core 2 Duo machines. One of the technically competent VMware programmers (Petr) suggests its due to a horribly broken hpet timer routine used by Linux. Read thru the thread http://www.vmware.com/community/thread.jspa?threadID=77895&tstart=0 Try booting with the kernel parameter: hpet=disable It did not solve the core 2 duo problem (at least not for me), but it didn't introduce any problems either, so since you are locking up and rebooting you should have a chance to try it. Personally I have never had this freeze/lockup outside of Vmware, so I think Petr is looking at a kernel problem when the problem is in Vmware. -- _____________________________________ John Andersen
Hi John, Thanks for your reply. Yes, the servers are HP-ML370's. Dual Xeon with Hyperthreading so /proc/cpuinfo shows up 4 CPUs. Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: John Andersen [mailto:jsa@pen.homeip.net] Sent: 04 April 2007 04:41 AM To: opensuse@opensuse.org Subject: Re: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups On Tuesday 03 April 2007, David Wilson wrote:
do you know if there's a way of picking up what caused the deadlock ? I've looked through /var/log/messages but there's no mention of anything with regards to the lockup.
Well I trimmed too much and do not have reference to your hardware any longer, but it seems to me that it was an SMP setup, No? There is currently a huge thread on the VMware Forum about freezes on Core 2 Duo machines. One of the technically competent VMware programmers (Petr) suggests its due to a horribly broken hpet timer routine used by Linux. Read thru the thread http://www.vmware.com/community/thread.jspa?threadID=77895&tstart=0 Try booting with the kernel parameter: hpet=disable It did not solve the core 2 duo problem (at least not for me), but it didn't introduce any problems either, so since you are locking up and rebooting you should have a chance to try it. Personally I have never had this freeze/lockup outside of Vmware, so I think Petr is looking at a kernel problem when the problem is in Vmware. -- _____________________________________ John Andersen -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Anyone with any further ideas or input on this one ? Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: David Wilson [mailto:dave@dcdata.co.za] Sent: 04 April 2007 08:09 AM To: 'opensuse@opensuse.org' Subject: RE: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups Hi John, Thanks for your reply. Yes, the servers are HP-ML370's. Dual Xeon with Hyperthreading so /proc/cpuinfo shows up 4 CPUs. Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: John Andersen [mailto:jsa@pen.homeip.net] Sent: 04 April 2007 04:41 AM To: opensuse@opensuse.org Subject: Re: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups On Tuesday 03 April 2007, David Wilson wrote:
do you know if there's a way of picking up what caused the deadlock ? I've looked through /var/log/messages but there's no mention of anything with regards to the lockup.
Well I trimmed too much and do not have reference to your hardware any longer, but it seems to me that it was an SMP setup, No? There is currently a huge thread on the VMware Forum about freezes on Core 2 Duo machines. One of the technically competent VMware programmers (Petr) suggests its due to a horribly broken hpet timer routine used by Linux. Read thru the thread http://www.vmware.com/community/thread.jspa?threadID=77895&tstart=0 Try booting with the kernel parameter: hpet=disable It did not solve the core 2 duo problem (at least not for me), but it didn't introduce any problems either, so since you are locking up and rebooting you should have a chance to try it. Personally I have never had this freeze/lockup outside of Vmware, so I think Petr is looking at a kernel problem when the problem is in Vmware. -- _____________________________________ John Andersen -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hi John, Sorry to bug you about this one. You're the only one who's managed to provide some input so far. Any ideas where to from here ? Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: David Wilson [mailto:dave@dcdata.co.za] Sent: 05 April 2007 04:54 PM To: opensuse@opensuse.org Subject: RE: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups Anyone with any further ideas or input on this one ? Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: David Wilson [mailto:dave@dcdata.co.za] Sent: 04 April 2007 08:09 AM To: 'opensuse@opensuse.org' Subject: RE: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups Hi John, Thanks for your reply. Yes, the servers are HP-ML370's. Dual Xeon with Hyperthreading so /proc/cpuinfo shows up 4 CPUs. Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: John Andersen [mailto:jsa@pen.homeip.net] Sent: 04 April 2007 04:41 AM To: opensuse@opensuse.org Subject: Re: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups On Tuesday 03 April 2007, David Wilson wrote:
do you know if there's a way of picking up what caused the deadlock ? I've looked through /var/log/messages but there's no mention of anything with regards to the lockup.
Well I trimmed too much and do not have reference to your hardware any longer, but it seems to me that it was an SMP setup, No? There is currently a huge thread on the VMware Forum about freezes on Core 2 Duo machines. One of the technically competent VMware programmers (Petr) suggests its due to a horribly broken hpet timer routine used by Linux. Read thru the thread http://www.vmware.com/community/thread.jspa?threadID=77895&tstart=0 Try booting with the kernel parameter: hpet=disable It did not solve the core 2 duo problem (at least not for me), but it didn't introduce any problems either, so since you are locking up and rebooting you should have a chance to try it. Personally I have never had this freeze/lockup outside of Vmware, so I think Petr is looking at a kernel problem when the problem is in Vmware. -- _____________________________________ John Andersen -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 03 April 2007 09:05:24 David Wilson wrote:
Hi guys,
How are you keeping ?
I have three Opensuse 10.2 x86_64 installations running on separate HP-ML370G4/5 servers (Dual Xeon).
Both systems have 4gigs of RAM each.
On these systems I'm running Asterisk which provides telephony for a 100 seat call center.
The conversations that are recorded by Asterisk are recorded to /var/spool/asterisk/monitor which is a tmpfs/shm file system and configured as follows in my /etc/fstab:
shm /var/spool/asterisk/monitor tmpfs defaults 0 0
On a daily basis each of these servers will 'dead lock' i.e. completely freeze. When this happens the servers do not respond to the keyboard or even 'ping'.
I've also experienced the same problem on a totally different non HP custom built server running the same OS (OpenSuse 10.2).
If I disable the RAM disk and record the conversation straight to hard disk then everything is fine - the servers do not lock up. Unfortunately I have to use the RAM disk due to performance issues with writing the recordings straight to disk.
I have a script that runs every minute that moves the completed recordings from RAM disk to hard disk, so it's not getting too full.
Any ideas where my problem could lie or how I can enable some type of RAM disk debugging to find out where things are going wrong ?
You say that they deadlock on a daily basis. Have you correlated it to the number of calls taken by that server each day or is it always at the same time? I have experienced freezes when a rogue process consumes my memory so that I end up with no physical or swap free. You are using an unrestricted tmpfs, could the memory consumption be slowly growing. Can I suggest you set a maximum size of your tmpfs for one machine and see if that helps. Andrew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
From what I've seen the space used in the tmpfs disk never exceeds 300Mb -
Hi Andrew, Thanks for your reply. The deadlocks seem to happen at random times. Yesterday the servers did not deadlock at all - though today 1 server deadlocked 3 times. There seems to be no pattern. though perhaps at some time something is going wrong and the tmpfs is filling up ? I will set a size limit tomorrow and will provide feedback. Thanks so much for your input so far ! Kind regards David Wilson D c D a t a CNS, CLS, Linux+, LPIC1, LPIC2 Email/MSN: dave@dcdata.co.za Phone: 0860-1-LINUX Fax: 0866878971 Mobile: 0824147413 Skype: dave-wilson -----Original Message----- From: Andrew Colvin [mailto:apc@abcj.demon.co.uk] Sent: 11 April 2007 08:52 AM To: suse-linux-e@suse.com Subject: Re: [opensuse] OpenSuse 10.2 x86_64 RAM disk & server lock ups On Tuesday 03 April 2007 09:05:24 David Wilson wrote:
Hi guys,
How are you keeping ?
I have three Opensuse 10.2 x86_64 installations running on separate HP-ML370G4/5 servers (Dual Xeon).
Both systems have 4gigs of RAM each.
On these systems I'm running Asterisk which provides telephony for a 100 seat call center.
The conversations that are recorded by Asterisk are recorded to /var/spool/asterisk/monitor which is a tmpfs/shm file system and configured as follows in my /etc/fstab:
shm /var/spool/asterisk/monitor tmpfs defaults 0 0
On a daily basis each of these servers will 'dead lock' i.e. completely freeze. When this happens the servers do not respond to the keyboard or even 'ping'.
I've also experienced the same problem on a totally different non HP custom built server running the same OS (OpenSuse 10.2).
If I disable the RAM disk and record the conversation straight to hard disk then everything is fine - the servers do not lock up. Unfortunately I have to use the RAM disk due to performance issues with writing the recordings straight to disk.
I have a script that runs every minute that moves the completed recordings from RAM disk to hard disk, so it's not getting too full.
Any ideas where my problem could lie or how I can enable some type of RAM disk debugging to find out where things are going wrong ?
You say that they deadlock on a daily basis. Have you correlated it to the number of calls taken by that server each day or is it always at the same time? I have experienced freezes when a rogue process consumes my memory so that I end up with no physical or swap free. You are using an unrestricted tmpfs, could the memory consumption be slowly growing. Can I suggest you set a maximum size of your tmpfs for one machine and see if that helps. Andrew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- This email and all contents are subject to the following disclaimer: http://www.dcdata.co.za/emaildisclaimer.html -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (3)
-
Andrew Colvin
-
David Wilson
-
John Andersen