btrfsck error found:

-pj

3 Feb 2024 3 Feb '24

21:21

Hi, passing: btrfsck --check --force /dev/mapper/system-root finds the following errors in the filesystem. ERROR: errors found in fs roots found 117460205568 bytes used, error(s) found total csum bytes: 110505304 total tree bytes: 1844690944 total fs tree bytes: 1645625344 total extent tree bytes: 60227584 btree space waste bytes: 440188035 file data blocks allocated: 291454246912 referenced 283273138176 Thinkcentre-M57p:~ # I am wondering how to proceed with this.

Show replies by date

Carlos E. R.

3 Feb 3 Feb

21:43

On 2024-02-03 22:21, -pj wrote:

...

Hi, passing: btrfsck --check --force /dev/mapper/system-root finds the following errors in the filesystem.

ERROR: errors found in fs roots found 117460205568 bytes used, error(s) found total csum bytes: 110505304 total tree bytes: 1844690944 total fs tree bytes: 1645625344 total extent tree bytes: 60227584 btree space waste bytes: 440188035 file data blocks allocated: 291454246912 referenced 283273138176 Thinkcentre-M57p:~ #

I am wondering how to proceed with this.

Sorry, no idea how to repair btrfs. I would try to check and repair from a rescue system. I suggest you download and raw copy this file to an usb stick: <http://download.opensuse.org/distribution/leap/15.5/live/openSUSE-Leap-15.5-Rescue-CD-x86_64-Build13.15-Media.iso> Then boot it. -- Cheers / Saludos, Carlos E. R. (from 15.4 x86_64 at Telcontar)

-pj

4 Feb 4 Feb

04:03

...

...
I am wondering how to proceed with this.

Sorry, no idea how to repair btrfs.

I would try to check and repair from a rescue system.

I suggest you download and raw copy this file to an usb stick:

<http://download.opensuse.org/distribution/leap/15.5/live/openSUSE-Leap-15.5-Rescue-CD-x86_64-Build13.15-Media.iso>

Then boot it.

I booted into a live session and opened the luks encrypted volume. then btrfsck --check /dev/mapper/system-root <- Failed with core dump btrfsck --check --repair /dev/mapper/system-root <- Failed with core dump Do you have a recommended place to report bugs for btrfs?

Carlos E. R.

23:05

On 2024-02-04 05:03, -pj wrote:

...

...
...
I am wondering how to proceed with this.

Sorry, no idea how to repair btrfs.

I would try to check and repair from a rescue system.

I suggest you download and raw copy this file to an usb stick:

<http://download.opensuse.org/distribution/leap/15.5/live/openSUSE-Leap-15.5-Rescue-CD-x86_64-Build13.15-Media.iso>

Then boot it.

I booted into a live session and opened the luks encrypted volume. then

btrfsck --check /dev/mapper/system-root <- Failed with core dump

btrfsck --check --repair /dev/mapper/system-root <- Failed with core dump

Do you have a recommended place to report bugs for btrfs?

Always openSUSE bugzilla. Always, always, always. Don't ask, always. You downloaded from openSUSE, you report to openSUSE. -- Cheers / Saludos, Carlos E. R. (from 15.4 x86_64 at Telcontar)

Robert Webb

5 Feb 5 Feb

01:18

On Sat, 3 Feb 2024 22:03:12 -0600, -pj <pj.opensuse@gmx.com> wrote:

...

...
...
I am wondering how to proceed with this.

Sorry, no idea how to repair btrfs. I would try to check and repair from a rescue system. I suggest you download and raw copy this file to an usb stick:

<http://download.opensuse.org/distribution/leap/15.5/live/openSUSE-Leap-15.5-Rescue-CD-x86_64-Build13.15-Media.iso>

Then boot it.

I booted into a live session and opened the luks encrypted volume. then

btrfsck --check /dev/mapper/system-root <- Failed with core dump

btrfsck --check --repair /dev/mapper/system-root <- Failed with core dump

After btrfsck showed that it can't run sanely under current conditions, by dumping core, you told it to go ahead and modify (--repair) your disk? Bad move. BTW, 'man btrfsck' brings up the man page btrfs-check(8) which says: "btrfsck is an alias of btrfs check command and is now deprecated." The man page also says: WARNING: Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck successfully repair all types of filesystem corruption. E.g. some other software or hardware bugs can fatally damage a volume. You may have a hardware problem causing disk corruption. If so, trying to use software to fix your disk would be a mistake, probably. Check your ram by booting memtest86 (memtest86+). If, after sufficient time running the tests, there are no errors, boot your Rescue system and, with /dev/mapper/system-root un-mounted, run 'btrfs check /dev/mapper/system-root'. Do not use the '--force' option.

...

Do you have a recommended place to report bugs for btrfs?

Follow Carlos' advice in his previous post for bug reporting. -- Robert Webb

Daniel Morris

6 Feb 6 Feb

09:17

On Mon, Feb 05, 2024 at 01:18:22AM +0000, Robert Webb wrote:

...

You may have a hardware problem causing disk corruption. If so, trying to use software to fix your disk would be a mistake, probably. Check your ram by booting memtest86 (memtest86+). If, after sufficient time running the tests, there are no errors, boot your Rescue system and, with /dev/mapper/system-root un-mounted, run 'btrfs check /dev/mapper/system-root'. Do not use the '--force' option.

In addition to the warnings of not making things worse, assuming SATA, try substituting your SATA cable to the drive. Anecdata: I went around the houses searching for the cause of a slew of filesystem errors (btrfs & xfs) a few years ago that even a motherboard switch didn't permanently fix. I had the luxury of all data being backed up and the system being a former workhorse due an upgrade. Switching disks to another machine showed no such underlying problems on the drives (checksumming file reads to backup versions of ~4TiB slow-moving data). I assumed bad controller on the motherboard (since it was now erroring on two of six drives). After upgrading the motherboard and processor/memory the trouble paused, but then returned after a few days/power-ups. A bit of Kepner-Tregoe "the trouble is, the trouble is not" and the common parts were PSU, SATA cables, case, keyboard/mouse. Since I had spare/new SATA cables, which didn't need a screwdriver/were least effort to change....and "Tada!" the problems went away (my instinct/prejudice blamed ten year old PSU to that point). I was surprised. The SATA cables had been connected and forgotten, there was no sign of strain on the connectors, pinched routing etc. In the years between, I heard a '2.5 Admin' podcast that slated a particular batch/vintage of SATA cables that were prone to deteriorating due to a reaction from one of the dyes used in the casings (ISTR). A friend had similar mysterious file systems problems on a Mint system last summer, and renewing SATA cables made those go away too. We might even have bought similar cables from the same supplier many years before. HTH, Daniel

Carlos E. R.

12:20

New subject: SATA cables [Was: btrfsck error found:]

On 2024-02-06 10:17, Daniel Morris wrote:

...

On Mon, Feb 05, 2024 at 01:18:22AM +0000, Robert Webb wrote:

...
You may have a hardware problem causing disk corruption. If so, trying to use software to fix your disk would be a mistake, probably. Check your ram by booting memtest86 (memtest86+). If, after sufficient time running the tests, there are no errors, boot your Rescue system and, with /dev/mapper/system-root un-mounted, run 'btrfs check /dev/mapper/system-root'. Do not use the '--force' option.

In addition to the warnings of not making things worse, assuming SATA, try substituting your SATA cable to the drive.

Anecdata: I went around the houses searching for the cause of a slew of filesystem errors (btrfs & xfs) a few years ago that even a motherboard switch didn't permanently fix. I had the luxury of all data being backed up and the system being a former workhorse due an upgrade.

Switching disks to another machine showed no such underlying problems on the drives (checksumming file reads to backup versions of ~4TiB slow-moving data). I assumed bad controller on the motherboard (since it was now erroring on two of six drives).

After upgrading the motherboard and processor/memory the trouble paused, but then returned after a few days/power-ups.

A bit of Kepner-Tregoe "the trouble is, the trouble is not" and the common parts were PSU, SATA cables, case, keyboard/mouse. Since I had spare/new SATA cables, which didn't need a screwdriver/were least effort to change....and "Tada!" the problems went away (my instinct/prejudice blamed ten year old PSU to that point).

I was surprised. The SATA cables had been connected and forgotten, there was no sign of strain on the connectors, pinched routing etc.

In the years between, I heard a '2.5 Admin' podcast that slated a particular batch/vintage of SATA cables that were prone to deteriorating due to a reaction from one of the dyes used in the casings (ISTR).

But what sort of damage can make a cable fail in that manner? I'd guess changed transmission properties. It can not be continuity. I had to change SATA cables on a computer, but the failure was much more evident, as in "not working", maybe "not working at all.

...

A friend had similar mysterious file systems problems on a Mint system last summer, and renewing SATA cables made those go away too. We might even have bought similar cables from the same supplier many years before.

HTH, Daniel

-- Cheers / Saludos, Carlos E. R. (from 15.4 x86_64 at Telcontar)

Daniel Morris

7 Feb 7 Feb

10:45

New subject: SATA cables [Was: btrfsck error found:]

On Tue, Feb 06, 2024 at 01:20:45PM +0100, Carlos E. R. wrote:

...

...
In the years between, I heard a '2.5 Admin' podcast that slated a particular batch/vintage of SATA cables that were prone to deteriorating due to a reaction from one of the dyes used in the casings (ISTR).

But what sort of damage can make a cable fail in that manner?

Electrochemical migration (dendrites)? I don't know. Mine is anecdata and seems to be borne out by many others experiencing problems with relatively cheap SATA cables that carry gigabits of data for millions of seconds, who are then surprised that these things do fail. NASA Goddard have some fascinating pages on they mysteries of tin whiskers (which aren't dendrites): https://nepp.nasa.gov/whisker/background/index.htm They had a list of confirmed/suspected losses on there too, which made for very expensive failures. I'm in awe of the level of research they've detailed and collated, and the wonderful universe that we catch glimpses of. Daniel

-pj

6 Feb 6 Feb

23:43

On 02-06-2024 03:17AM, Daniel Morris wrote:

...

On Mon, Feb 05, 2024 at 01:18:22AM +0000, Robert Webb wrote:

...
You may have a hardware problem causing disk corruption. If so, trying to use software to fix your disk would be a mistake, probably. Check your ram by booting memtest86 (memtest86+). If, after sufficient time running the tests, there are no errors, boot your Rescue system and, with /dev/mapper/system-root un-mounted, run 'btrfs check /dev/mapper/system-root'. Do not use the '--force' option.

In addition to the warnings of not making things worse, assuming SATA, try substituting your SATA cable to the drive.

Anecdata: I went around the houses searching for the cause of a slew of filesystem errors (btrfs & xfs) a few years ago that even a motherboard switch didn't permanently fix. I had the luxury of all data being backed up and the system being a former workhorse due an upgrade.

Switching disks to another machine showed no such underlying problems on the drives (checksumming file reads to backup versions of ~4TiB slow-moving data). I assumed bad controller on the motherboard (since it was now erroring on two of six drives).

After upgrading the motherboard and processor/memory the trouble paused, but then returned after a few days/power-ups.

A bit of Kepner-Tregoe "the trouble is, the trouble is not" and the common parts were PSU, SATA cables, case, keyboard/mouse. Since I had spare/new SATA cables, which didn't need a screwdriver/were least effort to change....and "Tada!" the problems went away (my instinct/prejudice blamed ten year old PSU to that point).

I was surprised. The SATA cables had been connected and forgotten, there was no sign of strain on the connectors, pinched routing etc.

In the years between, I heard a '2.5 Admin' podcast that slated a particular batch/vintage of SATA cables that were prone to deteriorating due to a reaction from one of the dyes used in the casings (ISTR).

A friend had similar mysterious file systems problems on a Mint system last summer, and renewing SATA cables made those go away too. We might even have bought similar cables from the same supplier many years before.

HTH, Daniel

Thank you for this excellent information and the time it took to write that out. I opened the machines case and looked at the Sata cables visually. I saw that the cable connected to the Sata SSD drive is a blue colored cable, looking very aftermarket and shorter. There is an extra longer Sata cable that was not used and it matched visually the Sata cable going to the CD/DVD drive. So I disconnected the blue Sata cable and am now using the other matching visually to what I presume to be the default OEM cable. There are 4 Sata ports available on this particular mainboard 3 are dark orange in color and 1 is black in color. Currently the machine is configured to use 2 of the 3 dark orange Sata ports on the mainboard. One for CD/DVD and the (visually identical) for the SSD drive. I tested the drive with (GUI) GSmartControl extended, results are (pass) 38,000 hours, on the Western Digital Blue SSD. I envoked scrub again today, with no errors found on scrubs completed status. I passed btrfsck --check --force /dev/mapper/system-root and Konsole is displaying many outputs as shown below (with slight variations because of many): - mirror 1 bytenr 97666785280 csum 0x04a8172b expected csum 0xf70d046d - Counts for qgroup id: 0/1507 are different our: referenced 14385676288 referenced compressed 14385676288 disk: referenced 423661568 referenced compressed 423661568 diff: referenced 13962014720 referenced compressed 13962014720 our: exclusive 16384 exclusive compressed 16384 disk: exclusive 0 exclusive compressed 0 diff: exclusive 16384 exclusive compressed 16384 - Neither scrub *or* --clean coredumped today. My question for you is (better yet Bugzilla). Could this have happened initially somehow when I cloned the initial 160 GB mechanical SSD, to the 1 Terabyte SSD (installed now)? Then expanded the btrfs partition, extended the size of the Logical Volume and then extended the btrfs filesystem? All in order to maximize the use of the 1 terabyte drive being used now. I do have this drive which is being used now cloned, to another 1 TB drive. I can believe the cloned drive tests the same as the drive in use now though. I have not passed: btrfs check --repair /dev/<device_name> on the cloned backup drive. I did try btrfs check --repair /dev/<device_name> on this drive. I wonder if I should revert to the backup (cloned drive)? I understand about where to file a bug report now on Bugzilla. I have entered a bug report on Bugzilla, bug #1219539 . Thank you again for your insight on this. -pj

Daniel Morris

7 Feb 7 Feb

10:52

On Tue, Feb 06, 2024 at 05:43:26PM -0600, -pj wrote:

...

My question for you is (better yet Bugzilla). Could this have happened initially somehow when I cloned the initial 160 GB mechanical SSD, to the 1 Terabyte SSD (installed now)? Then expanded the btrfs partition, extended the size of the Logical Volume and then extended the btrfs filesystem? All in order to maximize the use of the 1 terabyte drive being used now.

Sorry I can't offer further help, mine was just a warning not to break things in trying to fix. Do heed the filesystem experts' advice and go slowly, making sure you document what you do, just in case you need to backtrack/unwind. That fear of dataloss feeling is sickening, and most of us have been there. Daniel

315

Age (days ago)

319

Last active (days ago)

List overview

Download

9 comments

4 participants

participants (4)

-pj
Carlos E. R.
Daniel Morris
Robert Webb

btrfsck error found:

tags

participants (4)