Re: btrfsck error found:

6 Feb 2024

      On 02-06-2024 03:17AM, Daniel Morris wrote:
...
On Mon, Feb 05, 2024 at 01:18:22AM +0000, Robert Webb wrote:
...
You may have a hardware problem causing disk corruption.  If so,
trying to use software to fix your disk would be a mistake, probably.
Check your ram by booting memtest86 (memtest86+).  If, after sufficient
time running the tests, there are no errors, boot your Rescue system
and, with /dev/mapper/system-root un-mounted, run 'btrfs check
/dev/mapper/system-root'.  Do not use the '--force' option.
In addition to the warnings of not making things worse, assuming SATA,
try substituting your SATA cable to the drive.
Anecdata: I went around the houses searching for the cause of a slew of
filesystem errors (btrfs & xfs) a few years ago that even a motherboard
switch didn't permanently fix. I had the luxury of all data being backed
up and the system being a former workhorse due an upgrade.
Switching disks to another machine showed no such underlying problems on
the drives (checksumming file reads to backup versions of ~4TiB
slow-moving data). I assumed bad controller on the motherboard (since it
was now erroring on two of six drives).
After upgrading the motherboard and processor/memory the trouble paused,
but then returned after a few days/power-ups.
A bit of Kepner-Tregoe "the trouble is, the trouble is not" and the
common parts were PSU, SATA cables, case, keyboard/mouse. Since I had
spare/new SATA cables, which didn't need a screwdriver/were least effort
to change....and "Tada!" the problems went away (my instinct/prejudice
blamed ten year old PSU to that point).
I was surprised. The SATA cables had been connected and forgotten, there
was no sign of strain on the connectors, pinched routing etc.
In the years between, I heard a '2.5 Admin' podcast that slated a
particular batch/vintage of SATA cables that were prone to
deteriorating due to a reaction from one of the dyes used in the
casings (ISTR).
A friend had similar mysterious file systems problems on a Mint system
last summer, and renewing SATA cables made those go away too. We might
even have bought similar cables from the same supplier many years
before.
HTH,
  Daniel
Thank you for this excellent information and the time it took to write
that out.

I opened the machines case and looked at the Sata cables visually. I saw
that the cable connected to the Sata SSD drive is a blue colored cable,
looking very aftermarket and shorter. There is an extra longer Sata
cable that was not used and it matched visually the Sata cable going to
the CD/DVD drive. So I disconnected the blue Sata cable and am now using
the other matching visually to what I presume to be the default OEM cable.

There are 4 Sata ports available on this particular mainboard 3 are dark
orange in color and 1 is black in color. Currently the machine is
configured to use 2 of the 3 dark orange Sata ports on the mainboard.
One for CD/DVD and the (visually identical) for the SSD drive.

I tested the drive with (GUI) GSmartControl extended, results are (pass)
38,000 hours, on the Western Digital Blue SSD.

I envoked scrub again today, with no errors found on scrubs completed
status. I passed btrfsck --check --force /dev/mapper/system-root and
Konsole is displaying many outputs as shown below (with slight
variations because of many):
-
mirror 1 bytenr 97666785280 csum 0x04a8172b expected csum 0xf70d046d
-
Counts for qgroup id: 0/1507 are different
our:            referenced 14385676288 referenced compressed 14385676288
disk:           referenced 423661568 referenced compressed 423661568
diff:           referenced 13962014720 referenced compressed 13962014720
our:            exclusive 16384 exclusive compressed 16384
disk:           exclusive 0 exclusive compressed 0
diff:           exclusive 16384 exclusive compressed 16384
-
Neither scrub *or* --clean coredumped today.

My question for you is (better yet Bugzilla). Could this have happened
initially somehow when I cloned the initial 160 GB mechanical SSD, to
the 1 Terabyte SSD (installed now)? Then expanded the btrfs partition,
extended the size of the Logical Volume and then extended the btrfs
filesystem? All in order to maximize the use of the 1 terabyte drive
being used now.

I do have this drive which is being used now cloned, to another 1 TB
drive. I can believe the cloned drive tests the same as the drive in use
now though. I have not passed: btrfs check --repair /dev/<device_name>
on the cloned backup drive. I did try btrfs check --repair
/dev/<device_name> on this drive. I wonder if I should revert to the
backup (cloned drive)? I understand about where to file a bug report now
on Bugzilla. I have entered a bug report on Bugzilla, bug #1219539 .

Thank you again for your insight on this.
-pj