[opensuse] SSD storage and new Linux kernels [was: Re: IFUP (WAS: Re:knetworkmanager in 11)]
On Sat, Dec 27, 2008 at 8:23 PM, Randall R Schulz <rschulz@sonic.net> wrote:
On Saturday 27 December 2008 16:03, James Knott wrote:
...
With that, you raise the question of how often you write vs read.
Well, it is a pattern observed in all information processing that reads outnumber writes at all levels (CPU registers, level-1 and level-2 cache, RAM and secondary storage), but that does not change the fact that flash-based secondary storage cannot sustain as many write cycles as rotating magnetic media. To date, fancy redundancy schemes are required to balance the read / write cycle limits in flash-RAM devices when used as secondary storage devices.
Finally read this thread. The subject was left far behind. First, the earlier parts of the discussion seemed to assume most hard drive failures are caused by head crashes. We do disk recovery as part of our services. Our experience is that most failures are in the electronics of the drive, not the mechanical part. If you read the Google whitepaper on disk drive reliability, they conclude that disk drives fail independent of disk usage, so I really think the conceptual idea of having a limited number of disks writes is a very minor issue compared with drives failing just because electronics routinely fail after a period of time. As to the SSD discussion: I find it really interesting and I do suspect it is where we are headed. The linux kernel now has SSD support built into some of the filesystems. (ext3?, others?) The way it works is really cool to me. First the hardware side: When a SSD is new, the onboard electronics knows that none of the sectors are in use. When a sector is written to, the onboard electronics note the fact by removing them from the "free" list. Not only is that tracked but the number of times a sector has been written is also tracked. So when you go to update a commonly used sector (ie. the filesystem superblock) the wear-leveling algorithm says, I'm going to remap that sector to a lesser used sector, lets see what I have in the free list that has the least number of writes. Thus when you write to a sector, you are always providing just a logical sector id and have no idea which physical sectors are really being written. That all happens in hardware and I don't think requires any kernel support, but as you think about it, you start to wonder how sectors that are freed at the filesystem level get freed at the SSD hardware level for reuse. In particular I had considered that case of doing a disk wipe. Once you do that, the SSD should see all the sectors as used, so the wear-leveling algorithm should quit working from that point forward. And that is where the kernel side kicks in: That answer is that the ATA spec has been extended to support a "discard" command. And the linux kernel is adding support in various filesystems to invoke discard on any sectors that become free due to file deletion, etc. The end result is that those limited write cycles get to be spread across all of the "free" SSD sectors. That also means you will not want to use a SSD in such a way that it stays mostly full, but that a small part of the data is constantly changing. Aiui, that would cause the SSD to churn through the small number of free sectors very quickly and use up their limited write-cycles. I don't know if strategies to address this are in place yet. And I don't think the actual wear-leveling algorithms are public knowledge, so it is just guess work how this all works in detail. FYI: For those saying, what is OpenSUSE 10.3 lacking that would cause me to upgrade. If you want to use SSD storage, I suspect you will need a newer kernel than 10.3 offers. In reality you will need to research when SSD discard support was added to your filesystem of choice and then be sure you have a kernel new enough to have it. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer wrote:
We do disk recovery as part of our services. Our experience is that most failures are in the electronics of the drive, not the mechanical part.
I second that.
If you read the Google whitepaper on disk drive reliability, they conclude that disk drives fail independent of disk usage, so I really think the conceptual idea of having a limited number of disks writes is a very minor issue compared with drives failing just because electronics routinely fail after a period of time.
Harddrives, electronically and mechanically have been designed with a certain lifetime in mind. Designing for a certain MTBF is a fairly exact science, much more so today than e.g. 20 years ago. Which is why my 40Gb IDE drives failed roughly when I expected them to, whereas my older 6.4Gb drives just keep going and going. /Per -- /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2008-12-30 at 19:47 +0100, Per Jessen wrote:
Greg Freemyer wrote:
We do disk recovery as part of our services. Our experience is that most failures are in the electronics of the drive, not the mechanical part.
I second that.
Curiously, when I started studying electronics, there was the idea that the transistor was much better than valves because it was ethernal (amongst other things, of course). Now that I come to think of it, I didn't see in my books a study of why do electronics fail, component by component. Maybe they were learning it. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAklafNQACgkQtTMYHG2NR9W19gCfdb+ckOpYzoY6+PRBEO2Q4rtU JL4An2fqdT5gGjzxRoE4aIy8agl3tyQC =xmij -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tue, Dec 30, 2008 at 2:56 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tuesday, 2008-12-30 at 19:47 +0100, Per Jessen wrote:
Greg Freemyer wrote:
We do disk recovery as part of our services. Our experience is that most failures are in the electronics of the drive, not the mechanical part.
I second that.
Curiously, when I started studying electronics, there was the idea that the transistor was much better than valves because it was ethernal (amongst other things, of course). Now that I come to think of it, I didn't see in my books a study of why do electronics fail, component by component. Maybe they were learning it.
Most electronics fail because of stress on the wires inside the chips. As a chip heats / cools from normal power on / power off activity the very small wires that connect the very small silicon chip to the very large pins at the edge of the plastic chip get flexed. This happens because the plastic chip itself slightly expands / compresses. Each time that happens the little wires inside the chip get flexed. After enough flexes the wires break. I believe that is one of the reasons the infamous freezer trick works on malfunctioning hard drives. By freezing the drive, you compress the plastic chips slightly and cause all the wires to make solid connections. As the drive warms up and starts expanding the broken wires separate again and your drive stops working again. Military grade chips, CPUs, etc. that are worth spending extra money on are often made with a ceramic chip enclosure. That is because ceramic expands less due to heating. The problem must not be as bad now as 20 years ago. Back then it was always recommended to leave computer equipment on as much as possible in order to extend the life of the equipment. I still tend to leave my PCs on all the time, but I don't see that advice given in general anymore. FYI: I've heard that in some military tech schools they teach you how to attempt to repair a broken chip by opening it up and replacing the little wires if they break. Sounds like a real challenge to me, but I can see why it would be a useful skill in some situations. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2008-12-30 at 18:01 -0500, Greg Freemyer wrote:
Curiously, when I started studying electronics, there was the idea that the transistor was much better than valves because it was ethernal (amongst other things, of course). Now that I come to think of it, I didn't see in my books a study of why do electronics fail, component by component. Maybe they were learning it.
Most electronics fail because of stress on the wires inside the chips. ... After enough flexes the wires break.
Make sense. But there are other auxiliary components that can fail earlier, like electrolytic capacitors.
I believe that is one of the reasons the infamous freezer trick works on malfunctioning hard drives. By freezing the drive, you compress the plastic chips slightly and cause all the wires to make solid connections. As the drive warms up and starts expanding the broken wires separate again and your drive stops working again.
Interesting. But it could also be the typical cold solder joint. That reminds me... I have an old computer with a HD of the step motor type for the head. If I start it on winter, it fails to boot, read error. I have to wait 5..15 minutes till it warms up, and I guess, the position of the head shifts enough to match the expected position.
Military grade chips, CPUs, etc. that are worth spending extra money on are often made with a ceramic chip enclosure. That is because ceramic expands less due to heating.
Yes, and it is also harder.
The problem must not be as bad now as 20 years ago. Back then it was always recommended to leave computer equipment on as much as possible in order to extend the life of the equipment. I still tend to leave my PCs on all the time, but I don't see that advice given in general anymore.
Yes, I have seen that recommendation too.
FYI: I've heard that in some military tech schools they teach you how to attempt to repair a broken chip by opening it up and replacing the little wires if they break. Sounds like a real challenge to me, but I can see why it would be a useful skill in some situations.
Wow. I thought they only did that on movies. :-O - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAklay2IACgkQtTMYHG2NR9WgrQCfdoNWKEq4FNS2Y256FY9DzA4q Ce8AmgNwFInWGqrc5KON21yo3zK39LKp =30l/ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
Most electronics fail because of stress on the wires inside the chips. ... After enough flexes the wires break.
Make sense.
But there are other auxiliary components that can fail earlier, like electrolytic capacitors.
Each individual component carries a certain risk of breaking. Add to that the interconnections between components. Wrt electrolytic capacitors, I have had at least three motherboards fail due to those. /Per -- /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday, 2008-12-31 at 11:18 +0100, Per Jessen wrote:
Carlos E. R. wrote:
But there are other auxiliary components that can fail earlier, like electrolytic capacitors.
Each individual component carries a certain risk of breaking. Add to that the interconnections between components.
I know. My small point was that I did study components in some detail, but the books and teachers I had said nothing about how each component fails.
Wrt electrolytic capacitors, I have had at least three motherboards fail due to those.
Yes, around 2001 there were sold many boards that failed too early due to bad capacitors from the same manufacturer, I think. They wanted cheaper and they made big news :-( - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAklbX4YACgkQtTMYHG2NR9WozQCfd+MoFhVk/a/2XPfLYfF1VYoO kDsAn1Cno2Yul+RxEBdFN0Sm6l80HDsn =Luym -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. wrote:
But there are other auxiliary components that can fail earlier, like electrolytic capacitors.
Each individual component carries a certain risk of breaking. Add to that the interconnections between components.
I know. My small point was that I did study components in some detail, but the books and teachers I had said nothing about how each component fails.
When you're designing electronics, it's not so important _how_ a component will fail, but _when_ it will. I too studied electronics along time ago, and I can't remember many lessons about _how_ components fail. However, when I subsequently joined the real world, MTBF suddenly popped up.
Wrt electrolytic capacitors, I have had at least three motherboards fail due to those.
Yes, around 2001 there were sold many boards that failed too early due to bad capacitors from the same manufacturer, I think. They wanted cheaper and they made big news :-(
I know I had one ASUS board where I replaced a few of the capacitors, and I've also had two Gigabyte boards fail in the last 3-4 years. I know I could just have replaced the capacitors on those too, but I was lazy. /Per -- /Per Jessen, Zürich -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 2008/12/31 13:30 (GMT+0100) Per Jessen composed:
Wrt electrolytic capacitors, I have had at least three motherboards fail due to those.
Yes, around 2001 there were sold many boards that failed too early due to bad capacitors from the same manufacturer, I think. They wanted cheaper and they made big news :-(
I know I had one ASUS board where I replaced a few of the capacitors, and I've also had two Gigabyte boards fail in the last 3-4 years. I know I could just have replaced the capacitors on those too, but I was lazy.
I've had success replacing bad motherboard caps about 2/3 of the time, worse of late. I did 3 week before last and only one succeeded. I guess sometimes cap failure can take out other components. -- "Unless the Lord builds the house, its builders labor in vain." Psalm 127:1 NIV Team OS/2 ** Reg. Linux User #211409 Felix Miata *** http://fm.no-ip.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 2008-12-30T12:09:26, Greg Freemyer <greg.freemyer@gmail.com> wrote:
FYI: For those saying, what is OpenSUSE 10.3 lacking that would cause me to upgrade. If you want to use SSD storage, I suspect you will need a newer kernel than 10.3 offers. In reality you will need to research when SSD discard support was added to your filesystem of choice and then be sure you have a kernel new enough to have it.
Well, there is that, but the Intel SSDs (X-25-M) have been designed for ~5GB/d writes over 5 years, even without this. I assume they shuffle data around behind the back of the file system if one area becomes too used. I've got one running on 11.0 (my DSL is broken so I can't upgrade to 11.1 yet), and combined with the x200s battery life, this really rocks ;-) Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (5)
-
Carlos E. R.
-
Felix Miata
-
Greg Freemyer
-
Lars Marowsky-Bree
-
Per Jessen