On 02-06-2024 03:17AM, Daniel Morris wrote:
On Mon, Feb 05, 2024 at 01:18:22AM +0000, Robert Webb wrote:
You may have a hardware problem causing disk corruption. If so, trying to use software to fix your disk would be a mistake, probably. Check your ram by booting memtest86 (memtest86+). If, after sufficient time running the tests, there are no errors, boot your Rescue system and, with /dev/mapper/system-root un-mounted, run 'btrfs check /dev/mapper/system-root'. Do not use the '--force' option.
In addition to the warnings of not making things worse, assuming SATA, try substituting your SATA cable to the drive.
Anecdata: I went around the houses searching for the cause of a slew of filesystem errors (btrfs & xfs) a few years ago that even a motherboard switch didn't permanently fix. I had the luxury of all data being backed up and the system being a former workhorse due an upgrade.
Switching disks to another machine showed no such underlying problems on the drives (checksumming file reads to backup versions of ~4TiB slow-moving data). I assumed bad controller on the motherboard (since it was now erroring on two of six drives).
After upgrading the motherboard and processor/memory the trouble paused, but then returned after a few days/power-ups.
A bit of Kepner-Tregoe "the trouble is, the trouble is not" and the common parts were PSU, SATA cables, case, keyboard/mouse. Since I had spare/new SATA cables, which didn't need a screwdriver/were least effort to change....and "Tada!" the problems went away (my instinct/prejudice blamed ten year old PSU to that point).
I was surprised. The SATA cables had been connected and forgotten, there was no sign of strain on the connectors, pinched routing etc.
In the years between, I heard a '2.5 Admin' podcast that slated a particular batch/vintage of SATA cables that were prone to deteriorating due to a reaction from one of the dyes used in the casings (ISTR).
A friend had similar mysterious file systems problems on a Mint system last summer, and renewing SATA cables made those go away too. We might even have bought similar cables from the same supplier many years before.
HTH, Daniel
Thank you for this excellent information and the time it took to write that out. I opened the machines case and looked at the Sata cables visually. I saw that the cable connected to the Sata SSD drive is a blue colored cable, looking very aftermarket and shorter. There is an extra longer Sata cable that was not used and it matched visually the Sata cable going to the CD/DVD drive. So I disconnected the blue Sata cable and am now using the other matching visually to what I presume to be the default OEM cable. There are 4 Sata ports available on this particular mainboard 3 are dark orange in color and 1 is black in color. Currently the machine is configured to use 2 of the 3 dark orange Sata ports on the mainboard. One for CD/DVD and the (visually identical) for the SSD drive. I tested the drive with (GUI) GSmartControl extended, results are (pass) 38,000 hours, on the Western Digital Blue SSD. I envoked scrub again today, with no errors found on scrubs completed status. I passed btrfsck --check --force /dev/mapper/system-root and Konsole is displaying many outputs as shown below (with slight variations because of many): - mirror 1 bytenr 97666785280 csum 0x04a8172b expected csum 0xf70d046d - Counts for qgroup id: 0/1507 are different our: referenced 14385676288 referenced compressed 14385676288 disk: referenced 423661568 referenced compressed 423661568 diff: referenced 13962014720 referenced compressed 13962014720 our: exclusive 16384 exclusive compressed 16384 disk: exclusive 0 exclusive compressed 0 diff: exclusive 16384 exclusive compressed 16384 - Neither scrub *or* --clean coredumped today. My question for you is (better yet Bugzilla). Could this have happened initially somehow when I cloned the initial 160 GB mechanical SSD, to the 1 Terabyte SSD (installed now)? Then expanded the btrfs partition, extended the size of the Logical Volume and then extended the btrfs filesystem? All in order to maximize the use of the 1 terabyte drive being used now. I do have this drive which is being used now cloned, to another 1 TB drive. I can believe the cloned drive tests the same as the drive in use now though. I have not passed: btrfs check --repair /dev/<device_name> on the cloned backup drive. I did try btrfs check --repair /dev/<device_name> on this drive. I wonder if I should revert to the backup (cloned drive)? I understand about where to file a bug report now on Bugzilla. I have entered a bug report on Bugzilla, bug #1219539 . Thank you again for your insight on this. -pj