Aw: Re: [opensuse-factory] SSD erased after fstrim
![](https://seccdn.libravatar.org/avatar/2653b2306a143de14f5bd663f2bb2f4d.jpg?s=120&d=mm&r=g)
"Cristian Rodríguez" <crrodriguez@opensuse.org>schrieb: On Wed, Jan 6, 2016 at 4:47 PM, Frank Kunz <mailinglists@kunz-im-inter.net> wrote:
Hi,
I have some strange behavior while testing Tumbleweeed over the last two weeks. I have seen it with different versions, also with the latest 20160105 a few minutes ago.
When doing a 'fstrim -v /' and then reboot I get a 'Non-System disc or disk error,...'
I have the same problem with a oldish samsung SSD. it is not that the disk is wiped (at least in my case) but grub2 installation gets corrupted. reinstalling grub2 from a rescue system "fixes" the problem. I opted just to just "not do that then".. I had the same experience 2 times 4-5 weeks ago, so also no logs. But I hope the following information is still useful First I thought it was BTRFS or my 6 years old OCZ Agility SSD (sandforce controller) since 2 other Tumbleweed systems with newer SSDs (no sandforce), ext4 and same repos / configs had no problems. I regulary manually invoke fstrim on all 3 machines once a month or so but i do not remember if i trimmed right before GRUB2 was gone on that BTRFS machine. However I got the similar BIOS message "no operating system found". So I put in my Tumbleweed USB-Stick and chose the upgrade option. The installer found my root-system and the package-db without issues and I chose then to re-install grub2 on root where it was before (and some base-packages like kernel and systemd).
That fixed it until it happend a week later again after a "normal" zypper dup, fairly sure that I did not trim that time - maybe the kernel does this automatically these days, but I am not sure if this needs the "discard" option in /etc/fstab which I do NOT have. Again I fixed missing grub with the above method, but this time installed it in MBR AND root. No grub2 problems since then - zypper-problems instead, again only on this machine. When I started using it again 1 week ago "zypper dup" installed only half of the new packages - then it gave up saying it could not extract any further packages. There were 5 GB space left on the SSD but I nontheless tried emptying trash and deleting .thumbnails and that helped zypper to extract / install some more rpms, but still not all and that borked my system so that I re-installed root with ext4 now. I really thouht my ssd is dying although the logs showed no errors for /dev/sdb. But after your reports I really don´t know whats going on. - Which controllers do your SSDs use? - Did you all trim manually or with discard option? - Did you all use BTRFS? - How old are your SSDs? - Do you see SSD-write-related errors in the logs? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
![](https://seccdn.libravatar.org/avatar/9435667f7160374bc34a8600b686aecd.jpg?s=120&d=mm&r=g)
07.01.2016 19:22, tomtomme пишет:
"Cristian Rodríguez" <crrodriguez@opensuse.org>schrieb: On Wed, Jan 6, 2016 at 4:47 PM, Frank Kunz <mailinglists@kunz-im-inter.net> wrote:
Hi,
I have some strange behavior while testing Tumbleweeed over the last two weeks. I have seen it with different versions, also with the latest 20160105 a few minutes ago.
When doing a 'fstrim -v /' and then reboot I get a 'Non-System disc or disk error,...'
I have the same problem with a oldish samsung SSD. it is not that the disk is wiped (at least in my case) but grub2 installation gets corrupted. reinstalling grub2 from a rescue system "fixes" the problem. I opted just to just "not do that then".. I had the same experience 2 times 4-5 weeks ago, so also no logs. But I hope the following information is still useful First I thought it was BTRFS or my 6 years old OCZ Agility SSD (sandforce controller) since 2 other Tumbleweed systems with newer SSDs (no sandforce), ext4 and same repos / configs had no problems. I regulary manually invoke fstrim on all 3 machines once a month or so but i do not remember if i trimmed right before GRUB2 was gone on that BTRFS machine. However I got the similar BIOS message "no operating system found". So I put in my Tumbleweed USB-Stick and chose the upgrade option. The installer found my root-system and the package-db without issues and I chose then to re-install grub2 on root where it was before (and some base-packages like kernel and systemd).
That fixed it until it happend a week later again after a "normal" zypper dup, fairly sure that I did not trim that time - maybe the kernel does this automatically these days, but I am not sure if this needs the "discard" option in /etc/fstab which I do NOT have.
See btrfsmaintenance package - it installs cron job to trim btrfs.
Again I fixed missing grub with the above method, but this time installed it in MBR AND root. No grub2 problems since then - zypper-problems instead, again only on this machine.
Well, this indirectly confirms hypothesis that btrfs trim deletes bootloader area. I'm still not sure where error comes from. Would someone who can reproduce it record state of btrfs bootloader area (64KiB at the beginning of device) when booting fails?
When I started using it again 1 week ago "zypper dup" installed only half of the new packages - then it gave up saying it could not extract any further packages. There were 5 GB space left on the SSD but I nontheless tried emptying trash and deleting .thumbnails and that helped zypper to extract / install some more rpms, but still not all and that borked my system so that I re-installed root with ext4 now.
I really thouht my ssd is dying although the logs showed no errors for /dev/sdb. But after your reports I really don´t know whats going on. - Which controllers do your SSDs use? - Did you all trim manually or with discard option? - Did you all use BTRFS? - How old are your SSDs? - Do you see SSD-write-related errors in the logs?
-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
![](https://seccdn.libravatar.org/avatar/5b748275c3dbb1ceee18ed554486547d.jpg?s=120&d=mm&r=g)
On Thursday 2016-01-07 18:21, Andrei Borzenkov wrote:
Would someone who can reproduce it record state of btrfs bootloader area (64KiB at the beginning of device) when booting fails?
Don't need to mess with the bootloader just to try. It is quite obvious that btrfs takes trim quite literally. # modprobe brd rd_size=$[4*1048576] # perl -e 'for(0..4*1048576){print "\xFF" x 1024}' >/dev/ram0 # mkfs.btrfs -K /dev/ram0 btrfs-progs v4.3+20151116 See http://btrfs.wiki.kernel.org for more information. [...] SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 4.00GiB /dev/ram0 # mount /dev/ram0 /mnt # fstrim -v /mnt /mnt: 3.6 GiB (3865571328 bytes) trimmed # umount /mnt 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00010000 e5 75 d6 a4 00 00 00 00 00 00 00 00 00 00 00 00 |.u..............| 00010010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00010020 65 21 55 61 6b af 48 e4 95 8f 80 69 e6 45 1f be |e!Uak.H....i.E..| 00010030 00 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 |................| 00010040 5f 42 48 52 66 53 5f 4d 07 00 00 00 00 00 00 00 |_BHRfS_M........| [...] # hexdump -C /dev/ram0 | grep ff.ff.ff.ff.ff.ff.ff.ff 02434000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 04001000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| Repeat the same with ext4 and you won't really see a difference: # perl -e 'for(0..4*1048576){print "\xFF" x 1024}' >/dev/ram0 # mkfs.ext4 /dev/ram0 -E nodiscard mke2fs 1.42.12 (29-Aug-2014) Creating filesystem with 1048576 4k blocks and 262144 inodes Filesystem UUID: 4a9768ef-99e0-4ff8-8b92-c015d5bf82eb Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done # mount /dev/ram0 /mnt 18:58 zap:/home/jengelh # fstrim -v /mnt /mnt: 3.8 GiB (4084932608 bytes) trimmed # umount /mnt # hexdump -C /dev/ram0|grep ff.ff.ff.ff.ff.ff.ff.ff 00101000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 00102000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 00111400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 08000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 08002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 18000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 18002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 28000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 28002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 38000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 38002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 48000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 48002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 48002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 80000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 80001000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 80010000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| c8000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| c8002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| d8000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| d8002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| (In other words, the longest ff.ff streak is 16 bytes, which could very well be 4x a 32-bit int carrying the value -1.) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
![](https://seccdn.libravatar.org/avatar/9435667f7160374bc34a8600b686aecd.jpg?s=120&d=mm&r=g)
07.01.2016 21:00, Jan Engelhardt пишет:
On Thursday 2016-01-07 18:21, Andrei Borzenkov wrote:
Would someone who can reproduce it record state of btrfs bootloader area (64KiB at the beginning of device) when booting fails?
Don't need to mess with the bootloader just to try. It is quite obvious that btrfs takes trim quite literally.
# modprobe brd rd_size=$[4*1048576] # perl -e 'for(0..4*1048576){print "\xFF" x 1024}' >/dev/ram0 # mkfs.btrfs -K /dev/ram0
It is not quite clean, mkfs could zero out this area. Clean test should have modify bootloader area after mkfs. But Frank confirmed that trim erases beginning of device.
btrfs-progs v4.3+20151116 See http://btrfs.wiki.kernel.org for more information. [...] SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 4.00GiB /dev/ram0
# mount /dev/ram0 /mnt # fstrim -v /mnt /mnt: 3.6 GiB (3865571328 bytes) trimmed # umount /mnt 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00010000 e5 75 d6 a4 00 00 00 00 00 00 00 00 00 00 00 00 |.u..............| 00010010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00010020 65 21 55 61 6b af 48 e4 95 8f 80 69 e6 45 1f be |e!Uak.H....i.E..| 00010030 00 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 |................| 00010040 5f 42 48 52 66 53 5f 4d 07 00 00 00 00 00 00 00 |_BHRfS_M........| [...] # hexdump -C /dev/ram0 | grep ff.ff.ff.ff.ff.ff.ff.ff 02434000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 04001000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
Repeat the same with ext4 and you won't really see a difference:
# perl -e 'for(0..4*1048576){print "\xFF" x 1024}' >/dev/ram0 # mkfs.ext4 /dev/ram0 -E nodiscard mke2fs 1.42.12 (29-Aug-2014) Creating filesystem with 1048576 4k blocks and 262144 inodes Filesystem UUID: 4a9768ef-99e0-4ff8-8b92-c015d5bf82eb Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736
Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done
# mount /dev/ram0 /mnt 18:58 zap:/home/jengelh # fstrim -v /mnt /mnt: 3.8 GiB (4084932608 bytes) trimmed # umount /mnt # hexdump -C /dev/ram0|grep ff.ff.ff.ff.ff.ff.ff.ff 00101000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 00102000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 00111400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 08000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 08002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 18000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 18002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 28000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 28002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 38000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 38002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 48000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 48002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 48002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 80000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 80001000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 80010000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| c8000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| c8002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| d8000400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| d8002000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
(In other words, the longest ff.ff streak is 16 bytes, which could very well be 4x a 32-bit int carrying the value -1.)
-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (3)
-
Andrei Borzenkov
-
Jan Engelhardt
-
tomtomme