Fwd: OpenSUSE 10.0 hangs at boot (from time to time - not always)
Hi all, I am new to this mailing-list so please be patient ;-) OK, I have a boot-problem with SUSE 10 on my computer: The Hardware: Dell Dimension 5000, Bios-Version A07 P4@3.0 GHz, 3GB RAM, Onboard LAN, ATI X300, Creative Soundblaster Live 24bit 2* Seagate (80 GB + 300 GB) The issue: I was able to install SUSE 10 from DVD (I bought the package). Even with that installation (Online Updates installed, no Packman or other additional software) the system hangs during boot every 5th-6th time. it hangs while the KDE-Login screen is showing, or alternatively JUST before it is about th show (when you see the KDE-Clock) or just immediately after login, while KDE is starting up and loading the user-session. I searched the web and found hints about ACPI-Problems. So I added the boot-option ACPI=OFF. Then I found information about issues with NFS and HyperThreading. The NFS-mounts have been changes to noauto, to make sure they are not processed. Boot-message says: "NFS unused". At last I found postings talking about APIC and LAPIC- So I added the boot-options NOAPIC and NOLAPIC. So I ended up having the following additional boot parameters (besides the SUSE 10 standard ones): ACPI=off, APM=power-off, NOAPIC, NOLAPIC. Result: It still hangs on every 5th-6th boot. That may be an immediate reboot, or one with cold motherboard... just no rules to see. WinXP works fine, SUSE 9.3 did so as well. i opened a service Call with SUSE / NOVELL, but they do not seem to have an idea and it seems as if they still struggle with the reorganisation from suse to novell: Two different Mail addresses, the people working for SUSE 10-Support still have the old Mail address in their footer linking to suse 9.3-suppport who is not responsible and keeps sending back the mails... Mails to the new address are not answered and are not reflected in the status of the web-ticket. So: Does anybody have an idea about where to look at? Which logfile would be interesting? SUSE asked for /var/log/boot.msg but has not responded and that's 5 weeks in the past. I#m not sure, which version of the log is the one to look for, because when I reboot there is a new file written, isn't ist? And the old one (where the system hangs) is gone or reorganised? I would really appreciate any new idea and I'm looking forward to hearing from you. Kind regards, Martin Soltau -------------------------------------------------------
On Monday 09 January 2006 4:17 pm, Martin Soltau wrote:
Hi all,
I am new to this mailing-list so please be patient ;-) OK, I have a boot-problem with SUSE 10 on my computer:
The Hardware: Dell Dimension 5000, Bios-Version A07 P4@3.0 GHz, 3GB RAM, Onboard LAN, ATI X300, Creative Soundblaster Live 24bit 2* Seagate (80 GB + 300 GB)
The issue: I was able to install SUSE 10 from DVD (I bought the package). Even with that installation (Online Updates installed, no Packman or other additional software) the system hangs during boot every 5th-6th time. it hangs while the KDE-Login screen is showing, or alternatively JUST before it is about th show (when you see the KDE-Clock) or just immediately after login, while KDE is starting up and loading the user-session.
I searched the web and found hints about ACPI-Problems. So I added the boot-option ACPI=OFF. <snip> So I ended up having the following additional boot parameters (besides the SUSE 10 standard ones): ACPI=off, APM=power-off, NOAPIC, NOLAPIC.
I believe the kernel boot command line parameters should all be lower case, no CAPS. I've always used lower case. acpi-off apm=off noapic nolapic
Result: It still hangs on every 5th-6th boot. That may be an immediate reboot, or one with cold motherboard... just no rules to see. WinXP works fine, SUSE 9.3 did so as well.
IF (big if there) I'm right about case sensitivity that would explain the no change in symptoms. Since it seems to be happening in a GUI intense environment I'd double check your video card driver setup. <snip>
So: Does anybody have an idea about where to look at? Which logfile would be interesting? SUSE asked for /var/log/boot.msg but has not responded and that's 5 weeks in the past. I#m not sure, which version of the log is the one to look for, because when I reboot there is a new file written, isn't ist? And the old one (where the system hangs) is gone or reorganised?
/var/log/messages is cumulative. dmesg and any logs in the /var/log directory. You could also flip to Ctrl+Alt+F10 to monitor some system messages as the GUI is coming up. You won't see what it is freezing on but you may see a pattern that indicates what the last thing it reported was before the lock-up and then we may be able to help determining what the next step should be in the GUI start up... Looking for clues.
Kind regards, Martin Soltau
Stan
On Monday 09 January 2006 17:17, Martin Soltau wrote:
I would really appreciate any new idea and I'm looking forward to hearing from you.
Hi Martin, Some quick ideas come to mind, in no particular order: - use the boot option splash=none so you can try scrolling back up with Shift+PgUp for error messages. (btw, a complete list of these parameters is available at /usr/src/linux/Documentation/kernel-parameters.txt.) - I don't know off the top of my head exactly when tty10 (Ctl+Alt+F10) becomes available, but try it shortly before the hangs usually occur and watch for anything useful. - check the performance settings in your BIOS... look for something like "bleeding edge fast needs laboratory certified parts" to "dull, slow but stable with off-the-shelf parts." Try stepping down the performance to whatever "stable" is and see if the problem goes away. If it does, you may need to take a second look at your memory and/or settings (you can sometimes step down the CAS rate from, say, 2.5 to 3.0 and test.) If these tests eliminate the hangs, replace the modules with lifetime guarantee advance replacement name brand modules. Expensive? Not if you factor in the time you invest in troubleshooting these kinds of problems. - another option is to substitute the installed memory for known good name brand modules (different manufacturer, prefereably pulled from a working identical system) I'll share more if they come to mind, but you've got some stuff to play with now. ;-) regards, - Carl
Carl, Stan, thank you very much for your feedback. Stan: I doublechecked the boot options: They are written in lower case. So there shouldn't be a problem. In the default SUSE setup I used the driver "radeon" with no special settings. X-server startetd as configured during installation (1280 * 1024 @ 60). I have no clue about where to look at in that area... Thanks for the Ctr-Alt-F10-hint. it didn't hang during the last 2 boots, but I will check. The insteresting part is: The F10-console comes up a couple of seconds befor the x-server starts. Then the system automaticyally switches to the F7-console (X). I will have to try whether I can manage to get back to F10 before it hangs... Carl: I will try the splash=none-option. Thank you. Sorry, in my Dell-cut-down-to-basics BIOS there are only 2 performance-related settings: HyperThreading (has been switched on and off for several times, corrently it's off, never had an effect) and HDD accoustic mode (fast and loud or moderate and silent. Usually it's silent). I have NO BIOS-option to play with memory speed/latency. I have standard Infineon 400 Mhz-chips and they DID work with 9.3 and DO work with WinXP. Any further idea? I will come back to you once I had a hang while on F10-console and report what I have seen there. I have switched acpi=on, apm=on, no more apic-option (so they should be activated) and splash=none. For now I hope it soon hangs again to check F10-console ;-) Thank you very much and kind regards, Martin
Hi all,
I am new to this mailing-list so please be patient ;-) OK, I have a boot-problem with SUSE 10 on my computer:
The Hardware: Dell Dimension 5000, Bios-Version A07 P4@3.0 GHz, 3GB RAM, Onboard LAN, ATI X300, Creative Soundblaster Live 24bit 2* Seagate (80 GB + 300 GB)
The issue: I was able to install SUSE 10 from DVD (I bought the package). Even with that installation (Online Updates installed, no Packman or other additional software) the system hangs during boot every 5th-6th time. it hangs while the KDE-Login screen is showing, or alternatively JUST before it is about th show (when you see the KDE-Clock) or just immediately after login, while KDE is starting up and loading the user-session. Hi Martin, I don't know whether your problem is the same as mine, but I have had
Martin Soltau wrote: the same hangings (just as you described, at first it hanged up when trying to switch to login screen, next times he won't boot at all, hanged up somewhere in the middle). In my case I just rebuilded file system tree(I booted from Rescue System from CD) with "fsck.reiserfs --rebuild-tree /dev/hdaX" and all went gone. Note: I am newbie in Linux, so I don't know whether is it a good thing to rebuild file system (there was a lot of warnings I noticed only after running fsck...), but it helped in my case... sm
By any chance are you running an NFS server. I have found under the amd_64 bit operating system to hang if someone tries to do a nfs mount. It seems the latest version of the kernel has cleaned this up. Sergey Mkrtchyan wrote:
Hi all,
I am new to this mailing-list so please be patient ;-) OK, I have a boot-problem with SUSE 10 on my computer:
The Hardware: Dell Dimension 5000, Bios-Version A07 P4@3.0 GHz, 3GB RAM, Onboard LAN, ATI X300, Creative Soundblaster Live 24bit 2* Seagate (80 GB + 300 GB)
The issue: I was able to install SUSE 10 from DVD (I bought the package). Even with that installation (Online Updates installed, no Packman or other additional software) the system hangs during boot every 5th-6th time. it hangs while the KDE-Login screen is showing, or alternatively JUST before it is about th show (when you see the KDE-Clock) or just immediately after login, while KDE is starting up and loading the user-session. Hi Martin, I don't know whether your problem is the same as mine, but I have had
Martin Soltau wrote: the same hangings (just as you described, at first it hanged up when trying to switch to login screen, next times he won't boot at all, hanged up somewhere in the middle). In my case I just rebuilded file system tree(I booted from Rescue System from CD) with "fsck.reiserfs --rebuild-tree /dev/hdaX" and all went gone. Note: I am newbie in Linux, so I don't know whether is it a good thing to rebuild file system (there was a lot of warnings I noticed only after running fsck...), but it helped in my case...
sm
-- Joseph Loo jloo@acm.org
Hello all together, I finally managed to hang the system while on console ctrl - alt - F10. Usually (when the system does not hang) the last mesages deal with - shpchp: acpi_shpchprm evaluate BBN fail=0x5 - fglrx tainting kernel (ok, not nice, but boot-hangs also appear when ATI-driver is not installed) - CPU-microcode upgrade: not upgrading to older version - no config for sit0 That's ist - with a regular boot. Now, with the last hang, there was a huge list of information, as far as I rmember it was a dump of the processor registers and a call trace. Last entry in the call trace was: unknown bootoption+0x0/0x1e0 followed by: Kernel panic - not syncing: Fatal exception in interrupt. Does anyone have any idea where I can find this in a /var/log - file? It was a bunch of information and I was not able to write it all down... thanks and regards, Martin
On Tuesday 10 January 2006 16:42, Martin Soltau wrote: <snip>
Does anyone have any idea where I can find this in a /var/log - file? It was a bunch of information and I was not able to write it all down...
Hi Martin, Everybody has their own methods, but what I usually do after successfully booting the system again is (as root, of course): cd /var/log cp boot.omsg boot.omsg.chk ... note: 'o' is for 'old' for previous boot cp messages messages.chk ... note: new content is always appended cp warn warn.chk ... note: ditto chown *.chk {myself}:users ... note: I can now move the snapshot to my user home directory or another system to study and edit at my convenience. On another note, I was thinking that, since the 'decision' to crash isn't consistent but the timing of each crash *is* (approaching/at the KDE greeter) it might be worth testing a two stage boot: change the default to run level 3 and, after it boots, log in as root and run init 5. - If the frequency and timing of the crashes remains the same, i.e. approaching/at the KDE greeter, this points towards a problem in the X configuration or in the graphics subsystem, itself (config, hardware.) - If it stops falling over, then the problem can be isolated to the boot process, itself. It could be the module loading sequence... or a module selection/blacklisting 'tweak' is needed... or a timing conflict, as in some hardware sometimes being too slow to initialize... or marginal hardware that is 'landing' in one of two states when it is initialized *during boot* (Um, how big is the power supply and have you done any tests to rule it out?) That's all for now. I'll write more if something comes to mind. Good luck and have fun! - Carl
Hi together, Hi Carl!
On another note, I was thinking that, since the 'decision' to crash isn't consistent but the timing of each crash *is* (approaching/at the KDE greeter) it might be worth testing a two stage boot: change the default to run level 3 and, after it boots, log in as root and run init 5.
I did that and - do you want to have a guess - the system does not crash anymore.
- If the frequency and timing of the crashes remains the same, i.e. approaching/at the KDE greeter, this points towards a problem in the X configuration or in the graphics subsystem, itself (config, hardware.)
So I think KDE / Graphic driver can be excluded from the list of possible problems...
- If it stops falling over, then the problem can be isolated to the boot process, itself. It could be the module loading sequence... or a module selection/blacklisting 'tweak' is needed... or a timing conflict, as in some hardware sometimes being too slow to initialize... or marginal hardware that is 'landing' in one of two states when it is initialized *during boot* (Um, how big is the power supply and have you done any tests to rule it out?)
So... does anyone have any hints about how to approach this issue? The funny part is: All was fine with Suse 9.2 and 9.3... Nothing changed since - except the Suse release. So I would tend to put hardware-issues to priority 2. Let's start with the boot process and module order. How do I influence the boot/load-order of modules. Do I have to tweak the init.rc-scripts / links? I remember that there was a SUSE-Script that takes care of these... how was it named... can't remember :-( Am I completely free in changing the order? Do you have any suggestions about what to try first? I would assume it has to do with the last modules in runlevel 3 and the additional ones for level 5. Correct? And finally: To me it still seems to be a race condition. Is there a way to proove this? Do you know race conditions about SUSE boot / KDE modules / Runlevel 5? thank you very much and kind regads, Martin
On Wednesday 25 January 2006 16:42, Martin Soltau wrote:
Hi together, Hi Carl!
On another note, I was thinking that, since the 'decision' to crash isn't consistent but the timing of each crash *is* (approaching/at the KDE greeter) it might be worth testing a two stage boot: change the default to run level 3 and, after it boots, log in as root and run init 5.
I did that and - do you want to have a guess - the system does not crash anymore.
Hi Martin, At this point, I think it's reasonable to conclude that the PSU {insert an underlying cause here} is unable to cope with the greatly accelerated boot process in SUSE 10.0. It could be getting tired from age... or newly added hardware is demanding too much at cold boot... or a marginal/slow device is sometimes pulling the system down. I think one of these scenarios is likely your problem. In particular, optical drives (DVD/CD) are notoriously high power consumers after cold boots. When old and marginal, they can even pull enough extra current during extended initializations to 'starve' surrounding circuitry during POSTs. If you upgrade (oversize) the PSU to solve this symptom, what usually happens next is the marginal drive finally fails, so you end up spending money on two replacement parts instead of one (plus the time and trouble of a second repair.) :-/ Plan 'A': 1. Pull the DVD drive and test booting directly to run level 5. If the problem goes away, install a known good (borrowed) DVD drive and test again: --> If the problem stays away, replace your DVD drive. --> If the problem returns, reinstall your original DVD drive and: 2. Pull the CD drive and test booting directly to run level 5. If the problem goes away, install a known good (borrowed) CD drive and test again: --> If the problem stays away, replace your CD drive. --> If the problem returns, reinstall your original CD drive and: 3: Pull the high end graphics card, boot to run level 3 and, as root, run SaX2 to configure the on-board graphics. Test booting directly to run level 5. If the problem goes away, reinstall your graphics card and **upgrade the PSU.** Plan 'B': Given that the above testing can be a lot of work and your time may be more valuable, and since the parts aren't prohibitively expensive anymore, you might want to consider just upgrading the PSU and replacing the CD and DVD drives outright. This is not only faster, but you get the benefit of newer features and it removes any lingering doubts about these three items. The down side is the "real" problem could go undiagnosed (no testing) and rear it's ugly head again. Have fun and good luck! - Carl
Am Freitag, 27. Januar 2006 01:47 schrieb Carl Hartung:
Hi Martin,
Hi Carl,
At this point, I think it's reasonable to conclude that the PSU {insert an underlying cause here} is unable to cope with the greatly accelerated boot process in SUSE 10.0.
Well, currently I'm a little reluctant to change the PSU or look into hardware and I would like to know your thoughts about my theory:
It could be getting tired from age...
The machine is only 10 months old so I tend to exclude "tired from age" for now.
or newly added hardware is demanding too much at cold boot...
There has been no additional hardware since the Suse 9.3 times...
In particular, optical drives (DVD/CD) are notoriously high power consumers after cold boots. When old and marginal, they can even pull enough extra current during extended initializations to 'starve' surrounding circuitry during POSTs.
Well, that may be, BUT: - The problem occured during "restarts" (no cold boot) as well. - The CD/DVD-Drives are as well 10 months "young" (as the rest of the box) - IF circuits are starved during POST, then there should (to my understanding) be NO difference regarding their behaviour, no matter which Runlevel I boot to. Even with booting to Lv 3 and then switching to 5 the hardware should then be out of normal operating and cause a hang on KDE startup. Shouldn't it? I mean: POST is WAY before and whatever would go wrong there should remain so, regardless of the initial runlevel...
3: Pull the high end graphics card, boot to run level 3 and, as root, run SaX2 to configure the on-board graphics. Test booting directly to run level 5. If the problem goes away, reinstall your graphics card and **upgrade the PSU.**
ATI x300 is a passively cooled low level entry card that should be one of the easiest to handle for the PSU - I think. So I still think it is a race condition between KDE and the not comlpletely startet modules like ALSA, ACPI, HAL daemon, keymap or something like these. Therefor I am still looking for a way to serialize the process of loading modules making sure, that even with default runlevel 5 KDE only starts, when everything else has finished. Do you have an idea about how to schieve this? Regards, Martin
On Friday 27 January 2006 18:35, Martin Soltau wrote:
Am Freitag, 27. Januar 2006 01:47 schrieb Carl Hartung:
Hi Martin,
Hi Carl,
At this point, I think it's reasonable to conclude that the PSU {insert an underlying cause here} is unable to cope with the greatly accelerated boot process in SUSE 10.0.
Well, currently I'm a little reluctant to change the PSU or look into hardware
and I would like to know your thoughts about my theory:
It could be getting tired from age...
The machine is only 10 months old so I tend to exclude "tired from age" for now.
[... etc ...] My personal white box system locks up while cold booting about 1 out of 20 times if my USB mouse and keyboard are attached. When warm restarting the system this likelihood of lockup increases to 1 out of 3. When I switched the mouse/keyboard to regular ps/2 versions the system never hung during startup. Also, my file server running the same SuSE version has the regular ps/2 keyboard and mouse and has never had problems during boot up.
On Friday 27 January 2006 18:35, Martin Soltau wrote:
Well, currently I'm a little reluctant to change the PSU or look into hardware
Even if the system were only 1 month old, it could still be the memory, the PSU or a flaky drive. You have to focus on the symptoms.
- The problem occured during "restarts" (no cold boot) as well.
I thought it also locked up when you attempted disk to disk large file transfers? That stresses the memory, the hard drives *and* the PSU. Before you tinker with the software, you first need to rule out these possibilities. That is the logical troubleshooting sequence.
- The CD/DVD-Drives are as well 10 months "young" (as the rest of the box) - IF circuits are starved during POST, then there should (to my understanding) be NO difference regarding their behaviour, no matter which Runlevel I boot to.
Not true. There are many POSTs, not just the primary one built into the mainboard. Each device with built-in logic and a controller has it's own capabilities and requirements for successfully initializing. If one detects insufficient voltage immediately after power-on (fails it's internal POST) it can reset itself many times over until the problem it is detecting has passed. The question is, does it sometimes take too long and cause the device (or the whole sequence) to error out? This is one kind of scenario that could explain the *randomness* of the freezes after boot.
Even with booting to Lv 3 and then switching to 5 the hardware should then be out of normal operating and cause a hang on KDE startup. Shouldn't it? I mean: POST is WAY before and whatever would go wrong there should remain so, regardless of the initial runlevel...
10.0 uses a preloading technique that is designed to dramatically reduce boot times. When booting directly to run level 5, I suspect your primary system drive is undertaking a sustained high bandwidth transfer rate... one that is lasting long enough to freeze the system in *exactly the same way* that it is freezing when you test large file disk to disk transfers. I repeat: this can easily be the memory, the PSU or a flaky hard drive.
ATI x300 is a passively cooled low level entry card that should be one of the easiest to handle for the PSU - I think.
Agreed.
So I still think it is a race condition between KDE and the not comlpletely startet modules like ALSA, ACPI, HAL daemon, keymap or something like these.
*Except* for two considerations: These kinds of modules are usually pretty flexible with respect to timing... they have to be, since they deploy into all kinds of environments (new and old hardware, all kinds of brands and quality.) Cooperating with each other during boot is a fundamental design consideration. Secondly, I would really expect a software problem to leave *some* kind of evidence in the logs. But there *is* no such evidence and what you describe is a sudden hard lock... this is the signature of a hardware problem. FYI, Martin, I've been doing this a veeeery long time. I first went to Heald Engineering College to become an electronics engineering technician when we were transitioning from tubes to transistors... then, again, when we were transitioning from transistors to ICs (still pre-microprocessor.) I know my Ohm's Law and Thevenin's Theorum and Kirchoff's Law circuit calculations... I run a mean oscilloscope and high impedance DVM, too. Take my word for it... this "smells" a great deal more like flaky hardware than a "race" condition. I could be wrong, but that's definitely how I'd approach it.
Therefor I am still looking for a way to serialize the process of loading modules making sure, that even with default runlevel 5 KDE only starts, when everything else has finished. Do you have an idea about how to schieve this?
I've never needed to do this... not once... so, I have to put on my thinking cap and do some reading and research. It's pretty late at the moment, so I'll try to catch up with you sometime tomorrow. regards, - Carl
Hi Carl, Thank you for your patience and time up to now! I have to admit, you almost got me to start thinking about hardware. Nevertheless I would like to give a short update on what I did during the last days and some thoughts about your comments...
Even if the system were only 1 month old, it could still be the memory, the PSU or a flaky drive. You have to focus on the symptoms.
Right, I agree. What drives me nuts and leads me to my interpretation (-> Software-issue) is, that the system was running fine with 9.3 (for almost 5 months) and the problems started RIGHT AWAY with the update to 10.0. On that specific day of the upgrade with nothing done to the hardware.
I thought it also locked up when you attempted disk to disk large file transfers?
No. I can not remember having said that. Anyway - it is not true. The problem occurs ONLY while booting. Regardles of whether I do a cold boot or reboot. Either is randomly affected. But if (big IF here) the system "survives" the boot, it runs stable and flawless for hours even with heavy load. I have been playing Open GL-Games, burning DVDs, copying GB of data from one internal disk to another. Some of these activities have been carried out in parallel. No issues.
Each device with built-in logic and a controller has it's own capabilities and requirements for successfully initializing. If one detects insufficient voltage immediately after power-on (fails it's internal POST) it can reset itself many times over until the problem it is detecting has passed. The question is, does it sometimes take too long and cause the device (or the whole sequence) to error out?
OK, but shouldn't that have happened under 9.3 as well? I do not think, that 9.3 and 10.0 are THAT different here. Except for preloading of course. See next comment.
10.0 uses a preloading technique that is designed to dramatically reduce boot times. When booting directly to run level 5, I suspect your primary system drive is undertaking a sustained high bandwidth transfer rate...
I have deactivated the init-scripts boot.earlypreload and boot.preload quite a while ago. I assume that this reduces system load during startup. And I assume that the hardware-load is then somewhat comparable between SUSE 9.3 and 10.0? And that under normal circumstances everything that worked under 9.3 should also work under 10.0? (from a hardware-load and power-supply point of view)...
one that is lasting long enough to freeze the system in *exactly the same way* that it is freezing when you test large file disk to disk transfers.
But (!) the system does not freeze with large file disk2disk transfers. Thats what makes me wonder... What I did in the meantime: I edited the /etc/init.d-Script of KDE and added some startup-requirements so that "insserv kdm" puts the script to the very end of the startup-process. I can see on console 1 that KDM is the last service to be started. I changed the system parameter "RUN_PARALLEL" to "no" to make sure, all modules are loaded one after another.... seems to work. At least the system waits for NFS-Client to timeout, if the NFS server is offline. With this parameter set to "yes" the system continued starting modules and KDE, while NFS tried to import the remote filesystems from fstab... The bad thing is: It dind't have any effect...
so, I have to put on my thinking cap and do some reading and research. It's pretty late at the moment, so I'll try to catch up with you sometime tomorrow.
I am still hoping for your support. Thank you again for all the time you spent! As I mentioned above I am just about to start playing around with hardware even though not fully convinced. The only reason is, that another system installed from the very same DVD works fine. Of course there is a slightly different hardware with therefore other drivers used and no NFS client, no ATI driver, other BIOS... But what is still true for my machine: With the absolutely same hardware 9.3 did work great for months, Win XP still does and 10.0 had startup problems beginning with the exact day of installation. Kind regards, A still confused Martin
On Sunday 29 January 2006 16:59, Martin Soltau wrote:
Thank you for your patience and time up to now!
You're welcome... just pass it along when you can. ;-) OK, I'm convinced it's software. Good job! Now some questions about the "upgrade" procedure you used: Did you perform a "New Installation" of 10.0 or did you upgrade the existing 9.3 system? If it was a "New Installation," did you erase the contents of 9.3 "/" beforehand? (This is called a "clean" installation.) - Carl
Am Montag, 30. Januar 2006 03:30 schrieb Carl Hartung:
On Sunday 29 January 2006 16:59, Martin Soltau wrote:
Thank you for your patience and time up to now!
You're welcome... just pass it along when you can. ;-)
I'll do my very best!
OK, I'm convinced it's software. Good job!
So... I can only hope that we both are on the right track then ;-)
Now some questions about the "upgrade" procedure you used:
Did you perform a "New Installation" of 10.0 or did you upgrade the existing 9.3 system? I moved /home to a separate partition as a preparation. Afterwards I did a new installation from scratch including formatting all Linux partitions including "/", /opt, /usr. Because of the boot problems I did several installs (with and without immediate YOU, ATI drivers or multimedia packages from Packman - even plain 10.0 DVD-Install without any updates or 3rd-Party RPM had the problem). With one installation I actually even deleted /home (most of the data was on an NFS-imported filesystem anyway) to be sure to get rid of all /home/.xyz files and directories.
If it was a "New Installation," did you erase the contents of 9.3 "/" beforehand? (This is called a "clean" installation.)
Well... I formatted the partition. Is that what you mean? Regards, Martin
On Monday 30 January 2006 01:34, Martin Soltau wrote:
So... I can only hope that we both are on the right track then ;-)
I'm reasonably confident that's the case. :-)
Well... I formatted the partition. Is that what you mean?
Hi Martin, I just posted these instructions in another thread. They definitely apply to your situation, too: Before you reinstall again, from scratch (which is needed now,) boot into rescue mode, reiserfsck the filesystem for as many iterations as are needed to ensure it is clean, then mount the partition and manually erase the contents. This may seem hard to believe, at first, but the 'fast format' used during installation *does not* always prevent old data from 'bleeding through' from previously installed systems. When I moved to 10.0 from 9.3, I experienced this exact problem. It is a real "hair-puller" to diagnose. At the time, I 'met' others on this list (as well as on the opensuse list) who experienced the same problems. Installing to a 'dirty' partition can interfere with the installation itself *or not* and can cause unexplained configuration changes and 'creeping' filesystem corruptions to 'magically' appear. The procedure outlined above eliminates this problem. YMMV, of course (standard disclaimer!) regards, - Carl
Am Montag, 30. Januar 2006 14:44 schrieb Carl Hartung:
Hi Martin, Hi Carl.... nice to read you ;-)
Before you reinstall again, from scratch (which is needed now,) boot into rescue mode, reiserfsck the filesystem for as many iterations as are needed to ensure it is clean, then mount the partition and manually erase the contents. This may seem hard to believe, at first, but the 'fast format' used during installation *does not* always prevent old data from 'bleeding through' from previously installed systems. When I moved to 10.0 from 9.3, I experienced this exact problem. It is a real "hair-puller" to diagnose. At the time, I 'met' others on this list (as well as on the opensuse list) who experienced the same problems.
That's tough... well, OK. just a couple of questions (as always ;-)) - When I start the system in rescue mode (from the install-DVD I assume) - where do I mount the partitions? Are there any predefined nodes? Or can I just "mkdir xyz" and then mount there? - My last install was a couple of weeks ago and I didn't check then but: Isn't there an option for "low level format" as well? - What does "manually erase" mean? "rm -R *" or do I have to mkfs? And finally - just to get that straight: Are you telling me, that data, that is actually just lurking around on the drive and does not belong to any block linked via an Inode (I assume that is what fast formatting does... deleting the directory structure without really touching the data-blocks) suddenly comes alive? That can solely mean, that the filesystem / filestructure is corrupt... can't it? I mean... in the end this should be what "rm -R *" does as well.... I know that it is possible to read data that has been deleted in that way. But I never heard of data that has come back to live unwanted... Shouldn't that then happen with regular filesystems in daily operation as well? Why doesn't deleted data show up in files I have on my /home-dir? I am frequently deleting, creating and changing data on there without doing low-level formats. But all the files ar OK. Regardless of whether another file has been written to that physical block or not. At least I never ever had any binary file showing up in a text file... Sorry, I just don't get it. Can you explain that effect in a short essay? ;-)
Installing to a 'dirty' partition can interfere with the installation itself *or not* and can cause unexplained configuration changes and 'creeping' filesystem corruptions to 'magically' appear. The procedure outlined above eliminates this problem. YMMV, of course (standard disclaimer!)
Well, given the effort of reinstalling, loading multimedia-addons and ATI drivers ... I am really considdering living with that system until 10.1 is finally available and then doing the upgrade in the right way. But I can tell you... it's not funny. Is this the same with ALL systems (Win, Linux (debian, ...), Mac...) or just a SUSE kind of problem? Kind regards, Martin
On Monday 30 January 2006 16:20, Martin Soltau wrote:
Well, given the effort of reinstalling, loading multimedia-addons and ATI drivers ... I am really considdering living with that system until 10.1 is finally available and then doing the upgrade in the right way. But I can tell you... it's not funny. Is this the same with ALL systems (Win, Linux (debian, ...), Mac...) or just a SUSE kind of problem?
Hi Martin, I learned a very long time ago to: - low level format - partition - high level (filesystem) format - activate - install I'd heard of the "ghost file" phenomenon before, but never experienced it... probably because I've habitually followed this procedure. In fact, I don't think I actually got "lazy" in this regard until I started using SUSE's GUI installer. :-) I can't explain the precise mechanics of it, but my guess is there is still an assumption built somewhere into the underlying software that the installation is occurring on a new (empty) or properly "low level" formatted drive. It *does* still make intuitive sense to me that skipping the first step is asking for trouble. And since the solution is familiar territory to me and has been foolproof, so far, I've had no need to dig deeper into the cause. As an alternative to reinstalling, if the two stage boot worked around the boot freeze and the system was running well, why not stick with that? If it eventually crashes and burns, I or someone else here can step you through the process at that time. regards, - Carl
On Tuesday 31 January 2006 00:30, Carl Hartung wrote:
On Monday 30 January 2006 16:20, Martin Soltau wrote: snip
SUSE kind of problem?
Hi Martin,
I learned a very long time ago to:
- low level format - partition - high level (filesystem) format - activate - install
snip
I can't explain the precise mechanics of it, but my guess is there is still an assumption built somewhere into the underlying software that the installation is occurring on a new (empty) or properly "low level" formatted drive.
regards,
- Carl
I may be getting this wrong but I recall something from years gone back when a 40 meg hard drive has as big as you could get for a home pc. In those days, I think they said then, that you weren't suppose to low level reformat a drive once it had been done. I could be mistaken, years of memory eroding away, or is it a case that it's ok to do that nowadays. If so, how do you do a low level format? It used to be an option in the bios all those years ago but I've not seen it for nearly 20 years. Peter C
Peter Collier wrote:
On Tuesday 31 January 2006 00:30, Carl Hartung wrote:
On Monday 30 January 2006 16:20, Martin Soltau wrote: snip
SUSE kind of problem? Hi Martin,
I learned a very long time ago to:
- low level format - partition - high level (filesystem) format - activate - install
snip
I can't explain the precise mechanics of it, but my guess is there is still an assumption built somewhere into the underlying software that the installation is occurring on a new (empty) or properly "low level" formatted drive.
regards,
- Carl
I may be getting this wrong but I recall something from years gone back when a 40 meg hard drive has as big as you could get for a home pc. In those days, I think they said then, that you weren't suppose to low level reformat a drive once it had been done. I could be mistaken, years of memory eroding away, or is it a case that it's ok to do that nowadays. If so, how do you do a low level format? It used to be an option in the bios all those years ago but I've not seen it for nearly 20 years.
Drive technology has advanced since then. Back in those days, the main benefit of reformatting was to identify bad spots on the disk and mark them unavailable. Newer drives do this automagically and then map a spare sector to replace the bad one. This also means that when you start seeing bad sectors, it's time to replace the drive.
On Tuesday 31 January 2006 04:35, Peter Collier wrote:
I may be getting this wrong but I recall something from years gone back when a 40 meg hard drive has as big as you could get for a home pc. In those days, I think they said then, that you weren't suppose to low level reformat a drive once it had been done. I could be mistaken, years of memory eroding away, or is it a case that it's ok to do that nowadays. If so, how do you do a low level format? It used to be an option in the bios all those years ago but I've not seen it for nearly 20 years.
Hi Peter, I'm sure you're aware that "low level formatting" is done only once at the factory today, but the term still seems to endure in the field -- which is why I keep throwing quotes around it. :-) In the context of this discussion, it really means properly erasing the contents of a partition before reformatting and using it as a system disk. - Carl
On Tuesday 31 January 2006 13:07, Carl Hartung wrote:
On Tuesday 31 January 2006 04:35, Peter Collier wrote:
If so, how do you do a low level format? It used to be an option in the bios all those years ago but I've not seen it for nearly 20 years.
Hi Peter,
I'm sure you're aware that "low level formatting" is done only once at the factory today, but the term still seems to endure in the field -- which is why I keep throwing quotes around it. :-)
In the context of this discussion, it really means properly erasing the contents of a partition before reformatting and using it as a system disk.
- Carl
OK Carl I think you are saying, ideally you would use a program like wipe to erase all directories/files and then use the basic format program. Or would a simple rm prior to format do? Peter C
On Tuesday 31 January 2006 10:19, Peter Collier wrote:
OK Carl I think you are saying, ideally you would use a program like wipe to erase all directories/files and then use the basic format program. Or would a simple rm prior to format do?
Hi Peter, I am able to overcome this problem by: a. booting into rescue mode b. running reiserfsck on the partition until no corruptions are reported c. mounting the partition d. manually deleting all the files and directories (rm) e. running reiserfsck again to verify the fs is still 'clean' f. performing a normal installation FWIW, there is another 'flavor' of this problem that affects systems with multiple pre-existing partitions. The installer will become confused and 'land' at least /boot in the wrong partition. The only indication that there is a problem is the system fails first boot. You boot to rescue mode to investigate and find nothing in the target partition. This second 'flavor' requires wiping the drive before installation, period. My impression is this 'flavor' affects mostly drives that have been moved a lot from platform to platform, as happens frequently in testing labs. hth & regards, - Carl
On Tuesday 31 January 2006 18:21, Carl Hartung wrote:
Hi Peter,
I am able to overcome this problem by:
a. booting into rescue mode b. running reiserfsck on the partition until no corruptions are reported c. mounting the partition d. manually deleting all the files and directories (rm) e. running reiserfsck again to verify the fs is still 'clean' f. performing a normal installation
FWIW, there is another 'flavor' of this problem that affects systems with multiple pre-existing partitions. The installer will become confused and 'land' at least /boot in the wrong partition. The only indication that there is a problem is the system fails first boot. You boot to rescue mode to investigate and find nothing in the target partition.
This could explain why I get programs crashing, locking up X. Amarok did about 2 hours ago. Kmail kept crashing yesterday when I tried to move a email to the wastebin., even after deleting that email with mc. Openoffice, used to until recently, give a crashed message when closing the application and so on. Printer cups, needs restarting now and then.
This second 'flavor' requires wiping the drive before installation, period. My impression is this 'flavor' affects mostly drives that have been moved a lot from platform to platform, as happens frequently in testing labs.
I have two hard drives, hda and hdb. hda started as win98, then RH8.0, suse 9.0, 9.1, 9.3. Then I repartioned 8 gig to allow win98 to live on it for my wifes schooling. Now has win 2000 now with suse 10.0. Not good going off what you say. Win 2000 also complains at times, but I had put that down to trying to run microsoft. On the otherhand hdb I picked up 2nd hand with win98 on it. I deleted the partion table using dos fdisk and now it has ubuntu on it. That one does not seem to give many problems, especially considering thats the one I let the grandchildren loose on.
hth & regards,
- Carl
Thanks for the info Carl. When 10.1 is completed, I'll use your above tips. As a side note, I remember years ago having fun by not using the fdisk that belonged to the operating system on that partion. I had repartioned a 500 meg drive that had win 3.1 on it, using OS2 fdisk so that I could put OS2 on the other half. However, when booting into win 3.1, it claimed it still had the whole disk. This also occured when trying to install an early linux version. Peter C
participants (8)
-
Carl Hartung
-
James Knott
-
Joseph Loo
-
Ken Jennings
-
Martin Soltau
-
Peter Collier
-
Sergey Mkrtchyan
-
Stan Glasoe