[opensuse] reboot/shutdown occasionally fail
I have a problem that I occasionally see on all my systems, but is particularly nasty on systems which I remotely administer and cannot get to physically, easily. When I tell the system to reboot, something fails and instead of rebooting it effectively does what an init 3 would do, i.e. kills the KDE desktop and drops into the terminal console at a login in prompt. There is waits for input instead of continuing the shutdown process. I have fooled around with it at this point, logging in as root and then trying to issue a reboot or shutdown -r now command and I will see a broadcast message but at that point the system hangs completely and the only thing I can do is to physically reset or power cycle the computer. Thinking that a shutdown process may already be active, I have also tried shutdown -c now to cancel it, but always get a response saying that no shutdown pid could be found... Has anyone else seen this problem and is there a solution to it? I searched bugzilla but could not find any complaints about this issue, and since I am seeing it on several different computers, I find it hard to believe that I am the only one seeing it.. I am running openSuSE 11.2, x86_64 on all my systems. I also looked in /var/log/messages but see nothing that indicates a problem. Marc...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2010-05-14 at 12:13 -0700, Marc Chamberlin wrote:
Has anyone else seen this problem and is there a solution to it? I searched bugzilla but could not find any complaints about this issue, and since I am seeing it on several different computers, I find it hard to believe that I am the only one seeing it.. I am running openSuSE 11.2, x86_64 on all my systems. I also looked in /var/log/messages but see nothing that indicates a problem.
Yes, I have seen something similar to this, very occasionally, but as I'm always "local", it is not that much of a problem, and I can't reproduce at will to study. However, I learned time ago to not issue a halt or reboot from inside the session. I first log out, then halt. Otherwise, sometimes it crashes. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkvtq/cACgkQtTMYHG2NR9WU7ACfQ/jr5R89+AjuifbLn8sxiOYc SkoAn0a52OGK3SfJKNJ/GVGySrilo5sc =KDMG -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Marc Chamberlin wrote:
I have a problem that I occasionally see on all my systems, ... Has anyone else seen this problem and is there a solution to it? I searched bugzilla but could not find any complaints about this issue, and since I am seeing it on several different computers, I find it hard to believe that I am the only one seeing it.. I am running openSuSE 11.2, x86_64 on all my systems. I also looked in /var/log/messages but see nothing that indicates a problem.
I have a similar problem but didn't report it for 2 reasons: 1) I can't figure out where the problem is, and 2) (as a result of 1) can't figure out if it is related to some configuration problem I've created. But in my case the outcome is worse -- the system hangs when shutting down. It kills off all processes and goes into the final stages of shutdown -- but hangs before it actually unmounts the disks. It has to be power-cycled -- the result is disk-thrashing. I've lost a software RAID5 array twice, this way (that contained backups). Neither of my hardware RAID5's have been affected. It does indicate to me that Software RAID5 isn't very reliable and shouldn't be used for important data. I have experienced some data loss when some local reconfigurations caused loss of primary data when backups had been on a software RAID5. I'm contemplating how to switch over to hardware RAID5 for my backups so they won't be destroyed so often. But had my systems not experienced the shutdown-reboot problems you mentioned, I might never have noticed a problem with software RAID. FWIW -- if files didn't get written out completely, that'd be 'fine', but to have the arrays fail to be able to be assembled even in degraded mode is unacceptable from a reliability standpoint. -linda -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 5/14/2010 4:38 PM, Linda Walsh wrote:
Marc Chamberlin wrote:
I have a problem that I occasionally see on all my systems,
...
Has anyone else seen this problem and is there a solution to it? I searched bugzilla but could not find any complaints about this issue, and since I am seeing it on several different computers, I find it hard to believe that I am the only one seeing it.. I am running openSuSE 11.2, x86_64 on all my systems. I also looked in /var/log/messages but see nothing that indicates a problem.
---- I have a similar problem but didn't report it for 2 reasons: 1) I can't figure out where the problem is, and 2) (as a result of 1) can't figure out if it is related to some configuration problem I've created.
But in my case the outcome is worse -- the system hangs when shutting down. It kills off all processes and goes into the final stages of shutdown -- but hangs before it actually unmounts the disks. It has to be power-cycled -- the result is disk-thrashing. I've lost a software RAID5 array twice, this way (that contained backups). Neither of my hardware RAID5's have been affected.
It does indicate to me that Software RAID5 isn't very reliable and shouldn't be used for important data. I have experienced some data loss when some local reconfigurations caused loss of primary data when backups had been on a software RAID5.
I'm contemplating how to switch over to hardware RAID5 for my backups so they won't be destroyed so often.
But had my systems not experienced the shutdown-reboot problems you mentioned, I might never have noticed a problem with software RAID.
FWIW -- if files didn't get written out completely, that'd be 'fine', but to have the arrays fail to be able to be assembled even in degraded mode is unacceptable from a reliability standpoint.
-linda
FWIW I have seen another aspect of what I consider is a serious design flaw of the Linux kernel, in the shutdown/reboot processes. Some programs (MythTV comes immediately to my mind) can actually intercept the shutdown/reboot request and prevent it! And man does that infuriate me when I see it happen!!! When the OS receives a shutdown or reboot request, NOTHING short of a serious hardware failure should prevent it, IMHO. And NO app or process should EVER be allowed to prevent it. A shutdown or reboot request may have been initiated in response to an emergency, (especially in a situation like mine where I am remotely administering a computer and cannot be there to hit the power down button) therefore the Linux kernel must absolutely honor the request and force the shutdown or reboot to take place, regardless of any protests/failure from some darn app or process! I kinda suspect there has been some creeping features that are now allowing apps to intercept the shutdown/reboot request, and like in my situation, stopping the process at a runlevel 3 terminal with a login prompt. This is DEAD WRONG, it strongly indicates a fundamental design flaw in the kernel, because it should never ever happen. Best efforts should be made to preserve data, but that does not override the need to expediently shut down or restart a computer when so requested. In the scenario Linda points out, I too have seen a failure occur when unmounting devices, resulting in a hung system. Again that is just dead wrong, and the Linux kernel should prevent such an event by timing out the hung process and forcing the system to proceed with the shutdown/reboot process. Marc...
On Sat, May 15, 2010 at 6:55 AM, Marc Chamberlin <marc@marcchamberlin.com> wrote:
On 5/14/2010 4:38 PM, Linda Walsh wrote:
Marc Chamberlin wrote:
I have a problem that I occasionally see on all my systems,
...
Has anyone else seen this problem and is there a solution to it? I searched bugzilla but could not find any complaints about this issue, and since I am seeing it on several different computers, I find it hard to believe that I am the only one seeing it.. I am running openSuSE 11.2, x86_64 on all my systems. I also looked in /var/log/messages but see nothing that indicates a problem.
---- I have a similar problem but didn't report it for 2 reasons: 1) I can't figure out where the problem is, and 2) (as a result of 1) can't figure out if it is related to some configuration problem I've created.
But in my case the outcome is worse -- the system hangs when shutting down. It kills off all processes and goes into the final stages of shutdown -- but hangs before it actually unmounts the disks. It has to be power-cycled -- the result is disk-thrashing. I've lost a software RAID5 array twice, this way (that contained backups). Neither of my hardware RAID5's have been affected.
It does indicate to me that Software RAID5 isn't very reliable and shouldn't be used for important data. I have experienced some data loss when some local reconfigurations caused loss of primary data when backups had been on a software RAID5.
I'm contemplating how to switch over to hardware RAID5 for my backups so they won't be destroyed so often.
But had my systems not experienced the shutdown-reboot problems you mentioned, I might never have noticed a problem with software RAID.
FWIW -- if files didn't get written out completely, that'd be 'fine', but to have the arrays fail to be able to be assembled even in degraded mode is unacceptable from a reliability standpoint.
-linda
FWIW I have seen another aspect of what I consider is a serious design flaw of the Linux kernel, in the shutdown/reboot processes. Some programs (MythTV comes immediately to my mind) can actually intercept the shutdown/reboot request and prevent it! And man does that infuriate me when I see it happen!!! When the OS receives a shutdown or reboot request, NOTHING short of a serious hardware failure should prevent it, IMHO. And NO app or process should EVER be allowed to prevent it. A shutdown or reboot request may have been initiated in response to an emergency, (especially in a situation like mine where I am remotely administering a computer and cannot be there to hit the power down button) therefore the Linux kernel must absolutely honor the request and force the shutdown or reboot to take place, regardless of any protests/failure from some darn app or process!
I kinda suspect there has been some creeping features that are now allowing apps to intercept the shutdown/reboot request, and like in my situation, stopping the process at a runlevel 3 terminal with a login prompt. This is DEAD WRONG, it strongly indicates a fundamental design flaw in the kernel, because it should never ever happen. Best efforts should be made to preserve data, but that does not override the need to expediently shut down or restart a computer when so requested.
In the scenario Linda points out, I too have seen a failure occur when unmounting devices, resulting in a hung system. Again that is just dead wrong, and the Linux kernel should prevent such an event by timing out the hung process and forcing the system to proceed with the shutdown/reboot process.
I'm also seeing this problem both with 11.0 and 11.1 (32-bits). Was not able to find out what was causing it. One suspicious piece of SW was "innocuous" Webilder (python program, doing similar thing to wedshots desktop on MS Windows: it periodically downloads photos from webshots or flicker and makes them a wall-paper). It definitely caused frequent failures on shutdown. Eventually I stopped using it, but still the problem happens from time to time. In my experience after X is killed and I log in as root and issue "halt" command, in most cases it works, but sometimes it is stuck at some point after stopping network services. -- Mark Goldstein -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2010-05-14 at 20:55 -0700, Marc Chamberlin wrote: ...
FWIW I have seen another aspect of what I consider is a serious design flaw of the Linux kernel, in the shutdown/reboot processes. Some programs (MythTV comes immediately to my mind) can actually intercept the shutdown/reboot request and prevent it! And man does that infuriate me when I see it happen!!! When the OS receives a shutdown or reboot request, NOTHING short of a serious hardware failure should prevent it, IMHO. And NO app or process should EVER be allowed to prevent it. A shutdown or reboot request may have been initiated in response to an emergency, (especially in a situation like mine where I am remotely administering a computer and cannot be there to hit the power down button) therefore the Linux kernel must absolutely honor the request and force the shutdown or reboot to take place, regardless of any protests/failure from some darn app or process!
Are you sure it is the kernel that is at fault here, or is it the stop script (not the kernel) that is stoped/aborted? I guess it is the script, and this is not at all part of the kernel. It is part of the distribution, the basesystem packages. Probably initd or rc. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkvucYQACgkQtTMYHG2NR9UfRACZAXKNIOJffH+q6ZhTyGJxUQGx T8cAnAzsKRLe7hrcN/8gJitqdm3xlz96 =FxTZ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 5/15/2010 3:03 AM, Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Friday, 2010-05-14 at 20:55 -0700, Marc Chamberlin wrote:
...
FWIW I have seen another aspect of what I consider is a serious design flaw of the Linux kernel, in the shutdown/reboot processes. Some programs (MythTV comes immediately to my mind) can actually intercept the shutdown/reboot request and prevent it! And man does that infuriate me when I see it happen!!! When the OS receives a shutdown or reboot request, NOTHING short of a serious hardware failure should prevent it, IMHO. And NO app or process should EVER be allowed to prevent it. A shutdown or reboot request may have been initiated in response to an emergency, (especially in a situation like mine where I am remotely administering a computer and cannot be there to hit the power down button) therefore the Linux kernel must absolutely honor the request and force the shutdown or reboot to take place, regardless of any protests/failure from some darn app or process!
Are you sure it is the kernel that is at fault here, or is it the stop script (not the kernel) that is stoped/aborted?
I guess it is the script, and this is not at all part of the kernel. It is part of the distribution, the basesystem packages. Probably initd or rc.
Interesting question Carlos... Though I am not a guru on Linux, per say, I am a computer scientist and have written enough operating systems in my career to know where responsibilities should be delegated. Guess I probably should find some of that spare time I keep losing and study Linux deeper.. ;-) Almost every operating system I have worked with places the task manager/scheduler inside the protected space of the operating system kernel. This is a critical region of code and it is within the scheduler that a shutdown or reboot request should be acknowledged and ultimately handled, if need be. If the Linux designers want to spawn off a task to handle shutdown/reboot requests, or signal running tasks that they need to prepare for a system shutting down, that is fine, BUT the task manager/scheduler has the responsibility to force the shutdown should some task or process fail, including any such task spawned to process shutdown scripts. Any task or process can fail, for any number of reasons! Usually that is handled by setting an event timer of some kind, which can interrupt any task or process and resets the execution back to the kernel task manager/scheduler. This in turn can cause it to kill the offending process and proceed with the shutdown/reboot. If this is all done properly, then the code in the kernel space is pretty well protected from any form of memory corruption, which could cause the kernel to crash. And crashes do not normally lead to symptoms such as I and others here are reporting, i.e. intercepts of the shutdown/reboot request or simply closing X and resetting the runlevel to 3 with a basic terminal and asking the user to log in.... This has to have been done intentionally, and I am strongly arguing that this design is DEAD WRONG period. After a shutdown or reboot request has been made, the Linux OS/Kernel must see to it that it happens expeditiously, and enters into a state immediately that prevents anything else from happening until the shutdown process is complete, or the reboot process re-initializes the system for booting back up... That is a fundamental requirement of all computer operating systems, since the request to shutdown or reboot may have been initiated by an unforeseeable emergency of some kind. (Though I admit Windoz violates this principal as well, with their insistence on installing updates after a shutdown/reboot request has been made. And that also infuriates me to no end!) Marc..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2010-05-15 at 15:48 -0700, Marc Chamberlin wrote:
On 5/15/2010 3:03 AM, Carlos E. R. wrote:
Are you sure it is the kernel that is at fault here, or is it the stop script (not the kernel) that is stoped/aborted?
I guess it is the script, and this is not at all part of the kernel. It is part of the distribution, the basesystem packages. Probably initd or rc.
Interesting question Carlos... Though I am not a guru on Linux, per say, I am a computer scientist and have written enough operating systems in my career to know where responsibilities should be delegated. Guess I probably should find some of that spare time I keep losing and study Linux deeper.. ;-) Almost every operating system I have worked with places the task manager/scheduler inside the protected space of the operating system kernel. This is a critical region of code and it is within the scheduler that a shutdown or reboot request should be acknowledged and ultimately handled, if need be.
I understand that, but... a lot of things are done via scripts in linux. When you issue a "halt", a lot of things are done before the kernel is actually told to do the real "halt". Have a look at "/etc/inittab", "/etc/init.d/rc", and probably more. I think all methods sooner or later call "/etc/init.d/rc 0". I don't know which is the exact command that finally tells the kernel to stop, maybe that's inside the calling "halt" binary. I haven't investigated enough. And yes, I agree that halt should always succeed, no matter what. However, tasks may have a saying in the process, delaying it a bit till they get a fair chance of saving all their "absolute must be saved" data. In the end, it must halt, within a time limit. It could be configurable. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkvvQMUACgkQtTMYHG2NR9WYzwCfZyGdDI3Ek3ieknIYVVGw6usF xLEAn1ETWQZMV7e76gQOcZ0BxFre8ZVm =/djw -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 5/14/2010 12:13 PM, Marc Chamberlin wrote:
I have a problem that I occasionally see on all my systems, but is particularly nasty on systems which I remotely administer and cannot get to physically, easily. When I tell the system to reboot, something fails and instead of rebooting it effectively does what an init 3 would do,
Perhaps related, I have my laptop set up to suspend when I close the lid. It used to be 100% reliable. Now, it does not suspend the FIRST TIME. It just locks the screen and leaves the laptop running with screen on and lid closed. Re-open the lid, log in, close lid again and it will suspend to ram as expected. So far i've not been able to solve this. -- _____________________________________ At one time I had a Real Sig. Its been downsized. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2010-05-15 at 11:09 -0700, John Andersen wrote:
On 5/14/2010 12:13 PM, Marc Chamberlin wrote:
I have a problem that I occasionally see on all my systems, but is particularly nasty on systems which I remotely administer and cannot get to physically, easily. When I tell the system to reboot, something fails and instead of rebooting it effectively does what an init 3 would do,
Perhaps related, I have my laptop set up to suspend when I close the lid. It used to be 100% reliable.
It is not, unfortunately. It may work in one distro version, and crash the system in the next.
Now, it does not suspend the FIRST TIME. It just locks the screen and leaves the laptop running with screen on and lid closed. Re-open the lid, log in, close lid again and it will suspend to ram as expected.
So far i've not been able to solve this.
No log entries? You have to investigate after the first failed attempt, and before the second, sucessfull attempt. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkvvQWQACgkQtTMYHG2NR9VTAQCdE0K3zyqJzH28D2OpHWHUmWnz m0AAn0sP4GY5Ib04sYxk9b/0sbFKPqQH =HVsG -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (5)
-
Carlos E. R.
-
John Andersen
-
Linda Walsh
-
Marc Chamberlin
-
Mark Goldstein