New subject: [SLE] Problems with initrd after mkinitrd

28 Dec 2005

      ...
-----Original Message-----
From: Carl Hartung [mailto:suselinux@cehartung.com]
<snip>
...
As I'm sure you're aware ;-) roughly a minute after you posted this,
Patrick
wrote that he meant "sector 2082" and not "cylinder." (I'm not
convinced,
however, that /he's/ convinced, so I'm afraid we'll both have to stay
tuned...)
Ok so I looked -- that is cylinder 2082 (sorry about being a dolt but I
have a tendancy to *fuzz* up that which I don't absolutely need to know
after I've checked it out -- all I really knew, was that there should be
no issues with BIOS access to the disk.  I appreciate your gentle
nudges, however, instead of calling me an idiot (which might have been
warranted here).

I am very interested in what you had said in another post about C,H,S
being defaulted in the BIOS, I am going to look through the Intel docs
on this to see if there is any reference. [after looking: this document:
ftp://download.intel.com/support/motherboards/server/se7520bd2/sb/se7520
bd2_server_board_tps_r23.pdf on page 62 indicates that LBA is the
default in the BIOS for devices that support it].

<snip>
...
I don't know that Patrick was actually /using/ fdisk, only that
sectors
are a
natural metric for partitioning drives that use LBA.
Yes, I was using fdisk, but I had long since memorized the number for
scripts or hand editing, and hadn't looked at it in a while -- here is
an 'fdisk -l /dev/hda' for the curious:

Disk /dev/hda: 40.0 GB, 40007761920 bytes
16 heads, 63 sectors/track, 77520 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1        2081     1048792+  82  Linux swap /
Solaris
/dev/hda2   *        2082       77520    38021256   83  Linux

<snip>
...
- One, less frequent, is the failure of some of the cloned drives to
boot
immediately after they've been created and installed. He is presently
overcoming this 'flavor' of boot failure by reinstalling Grub.
Correct.
...
- The second 'flavor,' which is occurring more frequently, is a
failure to
boot after modifying the normally running system and creating a new
initrd
(with mkinitrd.) IOW, these cloned drives have /not/ failed to boot,
initially, and the systems have been running normally for some time.
Then,
after installing updates, he's run mkinitrd and the systems suddenly
fail
to
boot. He is presently overcoming this 'flavor' by tarring up the drive
contents, repartitioning the drive and restoring the contents.
Correct.
...
In both cases, the boot failures are occurring *only* on the drives
that
have
been cloned. He is not experiencing either type of boot failure in
those
systems where the drives have been installed raw and the systems have
been
built and upgraded from scratch.
I'm sorry -- I haven't been clear enough.  The second flavor occurs on
*all* drives after updates (where *all* means that the sample set can be
of either type (fresh-install or imaged) and produces the same 30%ish
hang rate after mkinitrd).  I've sort of been thinking-out-loud in these
posts -- and in the process have been a bit confusing -- I apologize.
Some posts have made me realize I didn't think about some thing or
another, and so I look back at it -- I think the response was to the
thought that there might be a some kind of problem with BIOS accessing
the drive so -- I assumed that fresh installs are going to write near
the front of the disk first and subsequent updates would potentially get
written elsewhere.  Then I later remembered that the partition started
at cylinder 2082 (but then confused cylinder and sector) and continued
to be perplexed.  So my state at that point was:  I thought that the
description of the problem (BIOS not able to read part of the disk where
the initrd resides) fit very well with the behavior I was seeing, even
though I *knew* that that particular problem *shouldn't* affect me and
thus came to the same hypothesis that Carlos did, which was: That some
potentially *unknown* BIOS problem that causes part of the disk to not
be accessible might be the issue.

The main behavior is intermittent, and while it could be hardware
related (BTW WD recommended us not to use the drive because the drive
was designed for a desktop (not server) duty cycle).  Subsequent updates
on drives that hung previously sometimes hang and sometimes do not.  In
my swamped-ness, I have looked now and again at *what* I was doing and
could find no *reason*.  The most recent hang was due to an updated xfs
driver -- a minor modification of the source that we made in house and
compiled on a different machine.  The module loads fine on a running
system and in this case was updated in the initrd without issue on 2 out
of 5 machines.  

<snip>
...
But he is building new systems using contemporary components that
support
Logical Block Addressing. As I understand it, these systems should
have no
difficulty booting from any location on a 40GB disk.
Exactly why I hadn't considered BIOS to be at issue.
...
I agree that the BIOS limitation you're alluding to is still a common
problem,
but I think it only concerns older hardware than what Patrick is
dealing
with. I am increasingly confident that the error lies somewhere in the
realm
of drive address calculations and/or translations.
What you state here (about calculations/translations), could explain the
problem quite nicely (and actually, I think, is a great fit for what
Carlos is saying).
...
That is /my/ educated guess. I am still thinking about ways to test
the
drives
for proof, though. Any ideas?
I have the used blocks ( -D ) output from debugreiserfs from both a
working and a hung system.  I think it has enough information to tell
what part of the filesystem the initrd is in -- I am not sure if it
indicates the LBA sectors.  If it does, I will be able to glean that in
a *very* tedious process.  But this still wouldn't answer your question,
I think, since it wouldn't tell us if the BIOS can properly address and
read those blocks.  So, I am curious, too (but I am also very grateful
for the help I have already received and do not want to appear greedy).

<snip>

Thanks,

Patrick

RE: [SLE] Problems with initrd after mkinitrd

Patrick Freeman

Carlos E. R.

Carl Hartung

tags

participants (3)