[opensuse-factory] Changes with boot/init for 10.3 ? Kernel locks in 10.3, not in 10.2 ?
Strange issue, but I've been running into problems with 10.3 since updating to beta 1 from 10.2. The kernel included on the install disc will not boot, it locks up at a random point but within 10-15 seconds after selecting install, although the safe kernel works and I managed to install. The system would not boot with the 10.3 B1 kernel unless I used kernel parameters (noapic etc.) which leads to a host of stability problems with my laptop. However, I was previously using 2.6.22 under 10.2 with no problems. Booted fine without requiring kernel parameters. I tried booting 10.3 B1 with that same kernel from 10.2 (since it still existed from my upgrade), and it would freeze 15-20s in, unless I used noapic. I even recompiled that same kernel using the same config file under 10.3, and the same problem exists. Downloaded and compiled 2.6.22.2 and ran into the same problem. Freezes during boot, works with noapic. I'm befuddled as to why this is an issue in 10.3 and not in 10.2. I suspect, but am not sure, that my issues are somehow related to the new modular mkinit process, it's more or less the only difference I can find in my boot logs as compared to 10.2. The lockups occur almost immediately after all the various xxx.sh scripts start running, but not always at the same point (which makes it harder to figure out where the problem is). It could be there's a fault with my hardware and kernel incompatibility that simply never came to light with 10.2 and only with whatever 10.3 does differently, but I want to track it down. I had similar issues in 10.2 initially that were related to 2.6.18.2 not supporting my hardware properly, but the issues more or less disappared with >2.6.20 and I was kernel-parameter free. Now I'm back to square one. Is there any source for docs on the specific changes made to mkinitrd for 10.3, as in, how to configure with various features and scripts are activated for boot? I went through the man docs, and there is info on how to enable additional settings (modules, features) but nothing on how to prevent things from being enabled, or whether there is a config file somewhere to control it all? Are there any other things that changed with the boot process I may have overlooked that could cause a previously hidden kernel problem to appear, when it remained dormant in 10.2? I'd like to roll up my sleeves (within reason ;) )and figure out what's going on, maybe by process of trial-and-error with poking and peeking various things, but am not sure where to start in this case. Sorry if this is a convoluted message, I'm not comfortable opening a bug report because I'm not sure I can reliably articulate where the actual problem is, so I'm looking for some diagnostic help before I do that. For reference's sake, the system is an HP dv9000 laptop with an AMD x2 1.8G and nvidia MCP51 chipset. Cheers, Kevin --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Tuesday 14 August 2007 23:43:15 Kevin Valko wrote:
Strange issue, but I've been running into problems with 10.3 since updating to beta 1 from 10.2.
I suspect, but am not sure, that my issues are somehow related to the new modular mkinit process, it's more or less the only difference I can find in my boot logs as compared to 10.2. The lockups occur almost immediately after all the various xxx.sh scripts start running, but not always at the same point (which makes it harder to figure out where the problem is).
Try setting "PROMPT_FOR_CONFIRM=yes" in /etc/sysconfig/boot. You might want to set CONFIRM_PROMPT_TIMEOUT to something longer than 5 seconds for the first couple of boots. I had similar problems on a HP dv6400 with some of the 10.3alpha kernels. In my case it turned out to be /etc/init.d/boot.clock and I could lock the system at will by running hwclock --systohc or hwclock --hctosys (I don't have the problem on the current 10.3beta1 kernel and could never quite nail down the root cause when I was seeing the problem on earlier kernels.) --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Wednesday 15 August 2007 02:23:05 am Warren Stockton wrote:
problem is).
Try setting "PROMPT_FOR_CONFIRM=yes" in /etc/sysconfig/boot. You might want to set CONFIRM_PROMPT_TIMEOUT to something longer than 5 seconds for the first couple of boots.
I had similar problems on a HP dv6400 with some of the 10.3alpha kernels. In my case it turned out to be /etc/init.d/boot.clock and I could lock the system at will by running hwclock --systohc or hwclock --hctosys (I don't have the problem on the current 10.3beta1 kernel and could never quite nail down the root cause when I was seeing the problem on earlier kernels.)
Thanks for the pointer, that worked but not for the reasons I expected. Once I set PROMPT_FOR_CONFIRM, I was able to step through the service startup and boot properly with both the -default kernel and my own custom kernel. I didn't need to use noapic. However, booting without PROMPT_FOR_CONFIRM and without noapic produces a total lock that occurs at some point between .localfs and .udev-retry, but there are no error messages thrown and nothing indicated in the logs that would point to where the issue is. Booting with noapic and without PROMPT_FOR_CONFIRM also allows a normal boot, though with the stability issues my system experiences with noapic. Bizarre. The only thing I can think is that the parallel booting of services is somehow causing an error condition in the kernel that doesn't occur when boot prompting forces a delay between service starts, or something along those lines? I'll do a round of trial and error, disabling each boot.xxx service one by one to see if I can narrow it down, and maybe disabling parallel services as well. As far as the issue with the clock, I do remember running into that with earlier kernels, I think it first cropped up in 2.6.20, there was some change made that involved acpi, the clock and hpet or something along those lines; I remember eliminating the problem by judiciously tweaking my .config settings, but the problem seemed to have disappeared for me in recent kernels. Any other pointers appreciated... Thanks, KV --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Wednesday 15 August 2007 12:54, Kevin Valko wrote:
and maybe disabling parallel services as well.
I would start with that. Than you have better chance to find service that makes trouble. -- Regards, Rajko. --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Wednesday 15 August 2007 09:10:45 pm Rajko M. wrote:
On Wednesday 15 August 2007 12:54, Kevin Valko wrote:
and maybe disabling parallel services as well.
I would start with that. Than you have better chance to find service that makes trouble.
Disabling parallel services did allow the system to boot normally without requiring interactive confirmations, so there is definitely a conflict happening somewhere with the boot services running in parallel. Unfortunately I'm not sure which one, I tried disabling a number of them individually but the boot still hard-locked when parallel services were enabled, I guess I can try to change the S/K sequences in boot.d, to see if grouping the services (the culprit lies in S12) instead of all together can point out the issue. Grrrr. On the plus side, disabling parallel loading didn't have a too significant impact on my boot time, it's still more or less in line with what I had in 10.2, so it's hardly the end of the world, but it is kind of a drag since faster booting is one of the significant improvements for 10.3. Cheers, KV --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Wednesday 15 August 2007 22:22, Kevin Valko wrote:
On Wednesday 15 August 2007 09:10:45 pm Rajko M. wrote:
On Wednesday 15 August 2007 12:54, Kevin Valko wrote:
and maybe disabling parallel services as well.
I would start with that. Than you have better chance to find service that makes trouble.
Disabling parallel services did allow the system to boot normally without requiring interactive confirmations, so there is definitely a conflict happening somewhere with the boot services running in parallel.
Unfortunately I'm not sure which one, I tried disabling a number of them individually but the boot still hard-locked when parallel services were enabled, I guess I can try to change the S/K sequences in boot.d, to see if grouping the services (the culprit lies in S12) instead of all together can point out the issue. Grrrr.
On the plus side, disabling parallel loading didn't have a too significant impact on my boot time, it's still more or less in line with what I had in 10.2, so it's hardly the end of the world, but it is kind of a drag since faster booting is one of the significant improvements for 10.3.
Cheers, KV
Hi Kevin, If parallel booting makes problem, than is some of scripts the culprit. It doesn't wait for it's dependencies to be performed. Grouping services will not help much as they are not ran sequentially anyway, but cleaning log files, booting and after lockup, booting Live CD and looking in logs may help to debug issue. The other method to isolate script would be add echo command to scripts that will give on the screen script name. For instance echo $0 >> /tmp/startup.log That will at least tell what was loaded before lockup and it will be preserved after new boot. -- Regards, Rajko. --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Wed, 15 Aug 2007, Rajko M. wrote:
On Wednesday 15 August 2007 22:22, Kevin Valko wrote:
On Wednesday 15 August 2007 09:10:45 pm Rajko M. wrote:
On Wednesday 15 August 2007 12:54, Kevin Valko wrote:
and maybe disabling parallel services as well.
I would start with that. Than you have better chance to find service that makes trouble.
Disabling parallel services did allow the system to boot normally without requiring interactive confirmations, so there is definitely a conflict happening somewhere with the boot services running in parallel.
Unfortunately I'm not sure which one, I tried disabling a number of them individually but the boot still hard-locked when parallel services were enabled, I guess I can try to change the S/K sequences in boot.d, to see if grouping the services (the culprit lies in S12) instead of all together can point out the issue. Grrrr.
On the plus side, disabling parallel loading didn't have a too significant impact on my boot time, it's still more or less in line with what I had in 10.2, so it's hardly the end of the world, but it is kind of a drag since faster booting is one of the significant improvements for 10.3.
Cheers, KV
Hi Kevin,
If parallel booting makes problem, than is some of scripts the culprit. It doesn't wait for it's dependencies to be performed. Grouping services will not help much as they are not ran sequentially anyway, but cleaning log files, booting and after lockup, booting Live CD and looking in logs may help to debug issue.
The other method to isolate script would be add echo command to scripts that will give on the screen script name. For instance echo $0 >> /tmp/startup.log
That will at least tell what was loaded before lockup and it will be preserved after new boot. maybe a sync after the echo would be a good idea
-- Mit freundlichen Gruessen, Andreas Vetter Fakultaet fuer Physik und Astronomie Tel: +49 (0)931 888-5890 Universitaet Wuerzburg Fax: +49 (0)931 888-5508 --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Thursday 16 August 2007 06:04, Andreas Vetter wrote:
The other method to isolate script would be add echo command to scripts that will give on the screen script name. For instance echo $0 >> /tmp/startup.log That will at least tell what was loaded before lockup and it will be preserved after new boot.
maybe a sync after the echo would be a good idea
Yes. It will be. One more line that will mark regular end of the script would be good too, but than one needs script that will change all scripts in /etc/init.d, or change scripts manually and save set for future. Than on next release use diff and look only scripts that are changed. I wonder what openSUSE developers use for debugging of start up? -- Regards, Rajko. --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
2007/8/16, Kevin Valko
On Wednesday 15 August 2007 09:10:45 pm Rajko M. wrote:
On Wednesday 15 August 2007 12:54, Kevin Valko wrote:
and maybe disabling parallel services as well.
I would start with that. Than you have better chance to find service that makes trouble.
Disabling parallel services did allow the system to boot normally without requiring interactive confirmations, so there is definitely a conflict happening somewhere with the boot services running in parallel.
Unfortunately I'm not sure which one, I tried disabling a number of them individually but the boot still hard-locked when parallel services were enabled, I guess I can try to change the S/K sequences in boot.d, to see if grouping the services (the culprit lies in S12) instead of all together can point out the issue. Grrrr.
On the plus side, disabling parallel loading didn't have a too significant impact on my boot time, it's still more or less in line with what I had in 10.2, so it's hardly the end of the world, but it is kind of a drag since faster booting is one of the significant improvements for 10.3.
What are the "parallel services"? Yesterday I installed 10.3 beta1 x86-64 in a four cores system without problems, and it works fine with the original kernel. (H8DA8 SuperMicro motherboard with AMD 8131/8111 chipset with 2 Opterons dual core 270). In other system, I could'nt install it, because a inmature support of the MSI and the ahci driver on the kernel, and this problem appear in the most of the distros. Regards --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
participants (5)
-
Andreas Vetter
-
Juan Erbes
-
Kevin Valko
-
Rajko M.
-
Warren Stockton