[Bug 687505] New: By minimazing applications, these disappear - no reaction of system a few minutes after restart
https://bugzilla.novell.com/show_bug.cgi?id=687505 https://bugzilla.novell.com/show_bug.cgi?id=687505#c0 Summary: By minimazing applications, these disappear - no reaction of system a few minutes after restart Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: 64bit OS/Version: openSUSE 11.3 Status: NEW Severity: Normal Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: antonio.liggieri@poyry.com QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) What was changed before the bug appeared? - We built a little cluster over a network for parallel computing and needed to activate the ssh service on several openSuse machines. We defined one server and 3 clients. What exactly are the symtoms? - On the maschine defined as server, applications e.g. a shell, disappear in the middle of the screen after minimazing it. It can only be reappeard on screen by using the key combination: Ctrl+F10 - But after a while not even this is possible any more, because the whole system doesn't react at all any more. You have to turn off the computer over the button at the hardware and restart it!! Reproducible: Always Steps to Reproduce: 1.Setup little cluster by cnnecting them over ssh (1 server, and at least 1 client) 2.Take a look at the server maschine, whether it works fine or not. 3.Wait until desktop doesn't react any more -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=687505
https://bugzilla.novell.com/show_bug.cgi?id=687505#c
zj jia
https://bugzilla.novell.com/show_bug.cgi?id=687505
https://bugzilla.novell.com/show_bug.cgi?id=687505#c1
Petr Cerny
https://bugzilla.novell.com/show_bug.cgi?id=687505
https://bugzilla.novell.com/show_bug.cgi?id=687505#c2
--- Comment #2 from Petr Cerny
could you please provide a more detailed description? From the initial bugreport it is hard to tell where the problem might be and it is definitely not descriptive enough to try to reproduce it.
Basically the problem can be split in two sypthoms. After setting up SSH for communcation between 4 maschines, where one of those maschines is set up as server, while the others as clients, the maschine set up as server, 1. changes the desktop charakteristics: by minimazing an application, the application disappears from screen. If you want to call up the application again, indeed you have to make that app visible again, you must do that by using the keyboard combination Ctrl+F10. 2. the shell doesn't react any more after while. You can e.g. make a listing by typing "ll" - it writes "ll", but no listing is made. If close the shell in order to relaunch a new one, it starts to launch one, but after abou 30 seconds a notice is shown: Error launching /usr/share/applications/kde4/konsole.desktop. Either KLauncher is not running any more, or it failed to start the application
Are you using some special clustering framework or is it just your own set of scripts and applications scheduling the work?
the software run in parallel is called OpenFOAM and is a free C++ library for computational fluid dynamics. The sw splits the computational domain into x parts which are processed by x processors. In my case I have 8 parts and 4 dualcore processors. The communication on network level is via ssh, but the management of the communication between the processors is done by an other free sw called open MPI.
Isn't by chance some processes on the machine eating up all resources? In that particular case using cgroups might help
In fact during the calculation the whole CPU power on all the maschines is required for it. But the strange thing is, that the problem is only on the maschine which is set up as the server. No problems at all with the clients. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=687505
https://bugzilla.novell.com/show_bug.cgi?id=687505#c3
--- Comment #3 from Petr Cerny
Basically the problem can be split in two sypthoms.
After setting up SSH for communcation between 4 maschines, where one of those maschines is set up as server, while the others as clients, the maschine set up as server,
1. changes the desktop charakteristics: by minimazing an application, the application disappears from screen.
Looks like part of the desktop environment (and I believe you are using KDE) gets killed
2. the shell doesn't react any more after while.
Ditto, only this time it might be KLauncher.
OpenFOAM @ 8 parts on 4 dualcore. The communication on network level is via MPI over ssh.
Cgroups In fact during the calculation the whole CPU power on all the maschines is required for it. But the strange thing is, that the problem is only on the maschine which is set up as the server. No problems at all with the clients.
Summing it up I would say, that the server machine is under a way too heavy load. Resource hungry calculations like CFD, FAE, MHD etc. are usually run on almost bare metal whenever possible. Things that might help (from the easiest to IMHO best): 1) Running the calculation on the server with a lower priority. 2) Switching off the GUI at least on the server would be the first thing to try (or at least changing to a less hungry one like XFCE or LXDE). Under some circumstances you might have the OOM killer destroy some applications unexpectedly. At least you'll get more memory for the essential tasks. 3) splitting the calculation into 7 parts instead of 8 and reserve one processor on the server for MPI and sshd. If the server also stores all the input/output for the computation clients, you won't get full 2 computational units anyway. Alternatively, add an additional computer (can be probably less powerful) to act as a coordination/data server for 4 clients doing the calculations. sshd might be resource consuming when serving more heavily communicating clients (remember that the links are encrypted and that comes with a cost). 4) I really suggest trying cgroups - it doesn't allocate the resources by default. Only when an application needs them, it gets them on the expense of others that would otherwise not give them away (i.e. even while doing the calculations you will be able to ssh into the machine). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=687505
https://bugzilla.novell.com/show_bug.cgi?id=687505#c4
zj jia
participants (1)
-
bugzilla_noreply@novell.com