[opensuse] Login weirdness

Michael Fischer

30 Oct 2018 30 Oct '18

21:51

Leap 42.3 Haven't encountered this on previous installations, nor prior to a few weeks ago on this box. On a reboot, where it wold normally prompt for login, login: <typing anything here is not echoed back to the screen> Give Root Password for Maintenance (or press Control-D to continue) trying to log in normally just keeps having the terminal behave strangely, re-outputing some version of the above. hitting CTRL-D just sends it into a shutdown sequence. Hitting ENTER many times eventually produces "Welcome To OpenSuse".... and a login prompt which works normally. I find the following in journalctl: Oct 30 15:19:42 blinkenlights login[946]: gkr-pam: error looking up user information Oct 30 15:19:43 blinkenlights login[946]: pam_unix(login:auth): check pass; user unknown which I think comes from the botched efforts while it was saying """ Give Root Password for Maintenance (or press Control-D to continue) """ I recall seeing an error message about tty1 on the screen when I was in the midst of the failed logins and reboots, but I do not find it in journalctl. Suggestions for fixing/debugging, etc.?? journalctl _does_ show: Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Currently unreadable (pending) sectors Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Offline uncorrectable sectors Not sure if that is pertainent. TIA Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Show replies by thread

Carlos E. R.

30 Oct 30 Oct

21:59

On 30/10/2018 22.51, Michael Fischer wrote:

...

Leap 42.3

Haven't encountered this on previous installations, nor prior to a few weeks ago on this box.

On a reboot, where it wold normally prompt for login,

login: <typing anything here is not echoed back to the screen> Give Root Password for Maintenance (or press Control-D to continue)

At this point, you can only login as root. You must login as root and solve the issue why the system is in emergency mode. Probably there was some message before that.

...

journalctl _does_ show:

Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Currently unreadable (pending) sectors Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Offline uncorrectable sectors

Not sure if that is pertainent.

Not necesarily - unless those bad sectors contain files needed for the start. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Michael Fischer

22:17

On Tue, Oct 30, Carlos E. R. wrote:

...

On 30/10/2018 22.51, Michael Fischer wrote:

...
Leap 42.3

Haven't encountered this on previous installations, nor prior to a few weeks ago on this box.

On a reboot, where it wold normally prompt for login,

login: <typing anything here is not echoed back to the screen> Give Root Password for Maintenance (or press Control-D to continue)

At this point, you can only login as root. You must login as root and solve the issue why the system is in emergency mode. Probably there was some message before that.

The prompts from the system suggest that certainly, but while typing the root password _sometimes_ gets me a root prompt, doing anything, (e.g. `ls`) gets me another cascade of "login failed" and the above messages again, and again...

...

...
journalctl _does_ show:

Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Currently unreadable (pending) sectors Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Offline uncorrectable sectors

Not sure if that is pertainent.

Not necesarily - unless those bad sectors contain files needed for the start.

Indeed... does a reinstall of the OS help this sort of thing, or does it more likely mean "new nvme drive wanted" + a reinstall of the OS ? (I mean, ordinarily, "bad disk" would mean "replace disk, etc." Just wondering if reinstall can do differently in this case - root is on an SSD) Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

22:22

On 30/10/2018 23.17, Michael Fischer wrote:

...

On Tue, Oct 30, Carlos E. R. wrote:

...
On 30/10/2018 22.51, Michael Fischer wrote:

...
Leap 42.3

Haven't encountered this on previous installations, nor prior to a few weeks ago on this box.

On a reboot, where it wold normally prompt for login,

login: <typing anything here is not echoed back to the screen> Give Root Password for Maintenance (or press Control-D to continue)

At this point, you can only login as root. You must login as root and solve the issue why the system is in emergency mode. Probably there was some message before that.

The prompts from the system suggest that certainly, but while typing the root password _sometimes_ gets me a root prompt, doing anything, (e.g. `ls`) gets me another cascade of "login failed" and the above messages again, and again...

Well, there is some kind of problem there. fsck? From rescue media.

...

...
...
journalctl _does_ show:

Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Currently unreadable (pending) sectors Oct 30 16:50:25 blinkenlights smartd[929]: Device: /dev/sda [SAT], 16 Offline uncorrectable sectors

Not sure if that is pertainent.

Not necesarily - unless those bad sectors contain files needed for the start.

Indeed... does a reinstall of the OS help this sort of thing, or does it more likely mean "new nvme drive wanted" + a reinstall of the OS ?

If the disk is really bad, nothing will help except replacing the disk. In fact, trying to reinstall can be a terrible idea. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Anton Aylward

31 Oct 31 Oct

10:52

On 30/10/18 06:22 PM, Carlos E. R. wrote:

...

If the disk is really bad, nothing will help except replacing the disk. In fact, trying to reinstall can be a terrible idea.

Therefore determining the reliability of the disk is a key step. I always keep a rescue CD around; actually I also keep a couple of generations of Knoppix CDs around as well as Knoppix seems able to boot when other CDs fail. There are a number of tools for determining disk quality & reliability and they have appeared in other threads and can be found by googling. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

11:10

On 31/10/2018 11.52, Anton Aylward wrote:

...

On 30/10/18 06:22 PM, Carlos E. R. wrote:

...
If the disk is really bad, nothing will help except replacing the disk. In fact, trying to reinstall can be a terrible idea.

Therefore determining the reliability of the disk is a key step.

I always keep a rescue CD around; actually I also keep a couple of generations of Knoppix CDs around as well as Knoppix seems able to boot when other CDs fail.

http://download.opensuse.org/distribution/leap/15.0/live/openSUSE-Leap-15.0-... It is more familiar. If installed on a stick, zypper works and tools can be added.

...

There are a number of tools for determining disk quality & reliability and they have appeared in other threads and can be found by googling.

smartctl in this case. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

James Knott

11:16

On 10/31/2018 06:52 AM, Anton Aylward wrote:

...

I always keep a rescue CD around; actually I also keep a couple of generations of Knoppix CDs around as well as Knoppix seems able to boot when other CDs fail.

You're still using CDs? I use USB sticks and have one just for "System Rescue", which is a Linux build just for working with systems. Works well. When a new version of opesnSUSE comes out, I install the new image on the stick I have for installing new version. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Anton Aylward

11:56

On 31/10/18 07:10 AM, Carlos E. R. wrote:

...

On 31/10/2018 11.52, Anton Aylward wrote:

...

...
There are a number of tools for determining disk quality & reliability and they have appeared in other threads and can be found by googling.

smartctl in this case.

I was thinking of things like BADBLOCKS(8) in particular BLKDISCARD(8) FSTRIM(8) And of course there are the more aggressive forms of FSCK and some file-system specific checkers and analysers. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Anton Aylward

11:59

On 31/10/18 07:16 AM, James Knott wrote:

...

On 10/31/2018 06:52 AM, Anton Aylward wrote:

...
I always keep a rescue CD around; actually I also keep a couple of generations of Knoppix CDs around as well as Knoppix seems able to boot when other CDs fail.

You're still using CDs?

For various things. But in this case is is a LiveCD image of Knoppix that is then written to USB.

...

...
I use USB sticks and have one just for "System Rescue", which is a Linux build just for working with systems. Works well. When a new version of opesnSUSE comes out, I install the new image on the stick I have for installing new version.

I found, for a long time, that it was easier to write in indelible ink of the CDs/DVDs. Now I attach luggage tags to the USB sticks and write on them. -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

James Knott

12:46

On 10/31/2018 07:59 AM, Anton Aylward wrote:

...

Now I attach luggage tags to the USB sticks and write on them.

I use key tags. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Anton Aylward

12:49

On 31/10/18 08:46 AM, James Knott wrote:

...

On 10/31/2018 07:59 AM, Anton Aylward wrote:

...
Now I attach luggage tags to the USB sticks and write on them.

I use key tags.

I tried those for a while. I came to dislike the 'split ring'. The luggage tags give more space to write on :-) -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

12:52

On 31/10/2018 12.56, Anton Aylward wrote:

...

On 31/10/18 07:10 AM, Carlos E. R. wrote:

...
On 31/10/2018 11.52, Anton Aylward wrote:

...
...
There are a number of tools for determining disk quality & reliability and they have appeared in other threads and can be found by googling.

smartctl in this case.

I was thinking of things like

BADBLOCKS(8) in particular BLKDISCARD(8) FSTRIM(8)

And of course there are the more aggressive forms of FSCK and some file-system specific checkers and analysers.

But smartctl runs the test designed by the disk manufacturer. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Carlos E. R.

12:53

On 31/10/2018 13.49, Anton Aylward wrote:

...

On 31/10/18 08:46 AM, James Knott wrote:

...
On 10/31/2018 07:59 AM, Anton Aylward wrote:

...
Now I attach luggage tags to the USB sticks and write on them.

I use key tags.

I tried those for a while. I came to dislike the 'split ring'. The luggage tags give more space to write on :-)

I use card labels with a string attached, bought by the hundred and cheap. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Michael Fischer

1 Nov 1 Nov

15:07

On Wed, Oct 31, Carlos E. R. wrote:

...

On 31/10/2018 11.52, Anton Aylward wrote:

...
On 30/10/18 06:22 PM, Carlos E. R. wrote:

...
If the disk is really bad, nothing will help except replacing the disk. In fact, trying to reinstall can be a terrible idea.

Therefore determining the reliability of the disk is a key step.

I always keep a rescue CD around; actually I also keep a couple of generations of Knoppix CDs around as well as Knoppix seems able to boot when other CDs fail.

http://download.opensuse.org/distribution/leap/15.0/live/openSUSE-Leap-15.0-...

It is more familiar. If installed on a stick, zypper works and tools can be added.

I grabbed that last night (notes from a previous Carlos post!) Need to buy usb sticks today

...

...
There are a number of tools for determining disk quality & reliability and they have appeared in other threads and can be found by googling.

smartctl in this case.

Any particular suggestions with this one? Anything which can be run from the NOT-rescue disk? TIA Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

15:19

On 01/11/2018 16.07, Michael Fischer wrote:

...

...
smartctl in this case.

Any particular suggestions with this one? Anything which can be run from the NOT-rescue disk?

smartctl --test=short /dev/sdX Replace X with the pro smartctl -a /dev/sdX and read. Then do smartctl --test=long /dev/sdX and wait some hours as told. Then read the results as before. If on doubt about the meaning, post the output here. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Michael Fischer

2 Nov 2 Nov

14:27

On Thu, Nov 01, Carlos E. R. wrote:

...

On 01/11/2018 16.07, Michael Fischer wrote:

...
...
smartctl in this case.

Any particular suggestions with this one? Anything which can be run from the NOT-rescue disk?

smartctl --test=short /dev/sdX

Replace X with the pro

smartctl -a /dev/sdX

and read. Then do

smartctl --test=long /dev/sdX

and wait some hours as told. Then read the results as before.

If on doubt about the meaning, post the output here.

Thanks much. I append the output of `smartctl -a /dev/sda`. I meant to run the --test=long last night, but fell asleep without triggering it... (bah) I note that `smartctl -a` basically said "PASSED" but that there were a few read errors. I've no idea what (more) to make of them. FWIW, I realized that /dev/sda is my /home and /tmp on rust, not my / on an SSD, so it is perhaps _less_ likely to be the cause of my login weirdness (I hope), and more amenable to a clean reinstall/upgrade. smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.6-1.g45f120a-default] (SUSE RPM) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST1000DM003-1SB102 Serial Number: Z9ACC0MX LU WWN Device Id: 5 000c50 0a28684ad Firmware Version: CC43 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Nov 1 19:26:17 2018 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 38) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 106) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 074 063 006 Pre-fail Always - 26493824 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 46 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 077 060 045 Pre-fail Always - 57728258 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7752 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 41 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 099 099 000 Old_age Always - 1 1 1 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 060 050 040 Old_age Always - 40 (Min/Max 30/41) 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 363 194 Temperature_Celsius 0x0022 040 016 000 Old_age Always - 40 (0 16 0 0 0) 195 Hardware_ECC_Recovered 0x001a 008 004 000 Old_age Always - 26493824 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 16 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 16 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 7716h+03m+29.370s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 6351787553 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 352717284 SMART Error Log Version: 1 ATA Error Count: 1 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 occurred at disk power-on lifetime: 7752 hours (323 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 00 00 00 00 00 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 00 00 00 00 00 00 00 ff 2d+03:57:39.774 NOP [Abort queued commands] b0 d4 00 81 4f c2 00 00 2d+03:57:18.107 SMART EXECUTE OFF-LINE IMMEDIATE b0 d0 01 00 4f c2 00 00 2d+03:57:18.000 SMART READ DATA ec 00 01 00 00 00 00 00 2d+03:57:17.991 IDENTIFY DEVICE ec 00 01 00 00 00 00 00 2d+03:57:17.990 IDENTIFY DEVICE SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short captive Interrupted (host reset) 60% 7752 - # 2 Short offline Completed without error 00% 7752 - # 3 Extended offline Completed: read failure 50% 7136 1001593016 # 4 Extended offline Completed: read failure 50% 6296 1001593016 # 5 Extended offline Completed: read failure 50% 5624 1001593016 # 6 Extended offline Completed: read failure 50% 4784 1001593016 # 7 Extended offline Completed: read failure 50% 4112 1001593016 # 8 Extended offline Completed: read failure 50% 3440 1001593016 # 9 Extended offline Completed: read failure 50% 2600 1001593016 #10 Extended offline Completed: read failure 50% 1929 1001593016 #11 Extended offline Completed: read failure 50% 1257 1021004240 #12 Extended offline Completed: read failure 50% 585 1001593008 #13 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

14:57

On 02/11/2018 15.27, Michael Fischer wrote:

...

On Thu, Nov 01, Carlos E. R. wrote:

...
On 01/11/2018 16.07, Michael Fischer wrote:

...
...
smartctl in this case.

...

I append the output of `smartctl -a /dev/sda`. I meant to run the --test=long last night, but fell asleep without triggering it... (bah)

It happens :-)

...

I note that `smartctl -a` basically said "PASSED" but that there were a few read errors. I've no idea what (more) to make of them.

I'll look.

...

FWIW, I realized that /dev/sda is my /home and /tmp on rust, not my / on an SSD, so it is perhaps _less_ likely to be the cause of my login weirdness (I hope), and more amenable to a clean reinstall/upgrade.

Ah, yes. Better run the test on all disks.

...

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.6-1.g45f120a-default] (SUSE RPM) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === ...

...

=== START OF READ SMART DATA SECTION === ... SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 074 063 006 Pre-fail Always - 26493824 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 46 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0

Ok, but watch this parameter.

...

7 Seek_Error_Rate 0x000f 077 060 045 Pre-fail Always - 57728258 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7752

Not an old disk.

...

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 41 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 099 099 000 Old_age Always - 1 1 1 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 060 050 040 Old_age Always - 40 (Min/Max 30/41) 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 363 194 Temperature_Celsius 0x0022 040 016 000 Old_age Always - 40 (0 16 0 0 0) 195 Hardware_ECC_Recovered 0x001a 008 004 000 Old_age Always - 26493824 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 16 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 16

Ah. Yes, this is important.

...

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 7716h+03m+29.370s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 6351787553 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 352717284

SMART Error Log Version: 1 ATA Error Count: 1 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 7752 hours (323 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state.

This section exceeds my skills, sorry. They are internal errors (to the disk firmware). And it is very recent, at 7752 hours. ...

...

SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short captive Interrupted (host reset) 60% 7752 - # 2 Short offline Completed without error 00% 7752 - # 3 Extended offline Completed: read failure 50% 7136 1001593016 # 4 Extended offline Completed: read failure 50% 6296 1001593016 # 5 Extended offline Completed: read failure 50% 5624 1001593016 # 6 Extended offline Completed: read failure 50% 4784 1001593016 # 7 Extended offline Completed: read failure 50% 4112 1001593016 # 8 Extended offline Completed: read failure 50% 3440 1001593016 # 9 Extended offline Completed: read failure 50% 2600 1001593016 #10 Extended offline Completed: read failure 50% 1929 1001593016 #11 Extended offline Completed: read failure 50% 1257 1021004240 #12 Extended offline Completed: read failure 50% 585 1001593008 #13 Short offline Completed without error 00% 0 -

Well, you have to do the long test to be sure. Notice that you can do the testing while you use the computer: it just may become sluggish or not respond. Do not power it off if it happens. Of course, the test will take longer if the computer is busy. Parameter 197. All hard disks develop errors. Operating systems know that, and can mark the bad sectors in order to just not use them. Modern (since years) disks can remap bad sectors to other sectors that are reserved for that purpose since manufacture date. This is done automatically by the firmware when writing to that bad sector. This parameter says that there are a number of sectors that have not being remapped. Concurrent to this, notice that there are several "extended offline" tests that did not complete, all at the same LBA. I would rewrite that LBA. You could try to find out to what file does that LBA belong, recover the file if possible or replace with backup copy, and write to that LBA. Not trivial. The write operation should trigger the remap. Then run again the long test to see if it stops at another LBA, then repeat till none appears. You can also run "badblocks" on that disk. This test takes many hours (even days), has to be done while umounted, thus from rescue media. Sometimes this is enough to clear those bad sectors, sometimes they appear again days later. If the command produces a list of bad sectors, then write to them to force a remap. One method is to rewrite to the entire partition with zeros or whatever, then recover the data from backup. Eventually, if you see the number of bad sectors to grow (seen on parameter 5) the only solution is to replace the disk. Some people "panic" at the first bad sector and replace the disk. I had in use disks that got some bad sectors very soon, did as above, and heard of no more bad sectors for years, till I replaced them with bigger disks because I wanted more space. The first hard disk I bought came with a bad sector list printed in paper by the manufacturer. All disks came with that at that time. It was 30 megabytes big, a huge disk at that time. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Liam Proven

17:40

On 02/11/2018 15:57, Carlos E. R. wrote:

...

All hard disks develop errors. Operating systems know that, and can mark the bad sectors in order to just not use them. Modern (since years) disks can remap bad sectors to other sectors that are reserved for that purpose since manufacture date. This is done automatically by the firmware when writing to that bad sector.

This is true. All SATA hard disks do this, and all later *E*IDE hard disks did to too. (SSDs are different and weird.)

...

This parameter says that there are a number of sectors that have not being remapped.

To me, that is a danger sign. I don't know exactly what it means or why but it's worrying.

...

Concurrent to this, notice that there are several "extended offline" tests that did not complete, all at the same LBA. I would rewrite that LBA. [...]

I am afraid I must disagree.

...

You can also run "badblocks" on that disk[...]

OK, I must disagree more.

...

The first hard disk I bought came with a bad sector list printed in paper by the manufacturer. All disks came with that at that time. It was 30 megabytes big, a huge disk at that time.

Well, the first one I used at work, yes. 20 MB and I added a 2nd 15MB drive to the machine for SCO Xenix 286. But times have changed a lot. The last new disk I bought was a 1TB notebook hard disk, a quarter the size of a deck of cards. This would need a pile of those Conner 225 ST-506 interface drives that I mentioned just now the size of (and a *lot* heavier than) a Space Shuttle to store an equivalent amount of data. This shook me at the time. All hard drives have some bad sectors, it's true. Most develop more during their operational life, also true. But they have a pretty large reserved area(maybe 10-15%, it varies a lot with model and makers don't like to disclose it. An >1% number of blocks, anyway.) and failed blocks are replaced from the spare blocks. This remapping is normal and invisible. The OS never knows there was a read error, it's just switched on the fly. If the OS can see errors, that means that either [a] the disk's replacement blocks are used up, meaning it has millions of bad blocks, or [b] the disk is defective in some other way. In either instance, I would regard that as a failing drive and replace it immediately. Don't waste time trying to rescue it. Get any remaining data off it, ASAP. Return it for warranty replacement, if possible. If not, send it for recycling, or take it to bits if you're curious. Do not waste time trying to fix it, and never use it for anything other than test purposes again. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Michael Fischer

17:57

On Fri, Nov 02, Carlos E. R. wrote:

...

On 02/11/2018 15.27, Michael Fischer wrote:

...

Ah, yes. Better run the test on all disks.

The ssd produced much less output from `smartctl -a` but also nothing which suggested errors (good, as that is /) I've got 2 external (usb-attached) drives which are my backups. smartctl need a `-d sat` to produce output from one of them (happy) and `-d scsi` for the other, which insisted that SMART support is: Available - device has SMART capability. SMART support is: Disabled I did `$ sudo smartctl -d scsi -s on /dev/sdb` but to no effect in the output of `$ sudo smartctl -i -d scsi /dev/sdb` Go figure. AFAIK, both those external disks are fine, but running badblocks on them now for "grins".

...

...
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0

Ok, but watch this parameter.

...
7 Seek_Error_Rate 0x000f 077 060 045 Pre-fail Always - 57728258 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7752

Not an old disk. >

...
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 16 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 16

Ah. Yes, this is important.

...
Error 1 occurred at disk power-on lifetime: 7752 hours (323 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state.

This section exceeds my skills, sorry. They are internal errors (to the disk firmware). And it is very recent, at 7752 hours.

Had a couple of "push button" forced restarts, and one complete power outage recently.

...

...
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short captive Interrupted (host reset) 60% 7752 - # 2 Short offline Completed without error 00% 7752 - # 3 Extended offline Completed: read failure 50% 7136 1001593016 # 4 Extended offline Completed: read failure 50% 6296 1001593016 # 5 Extended offline Completed: read failure 50% 5624 1001593016 # 6 Extended offline Completed: read failure 50% 4784 1001593016 # 7 Extended offline Completed: read failure 50% 4112 1001593016 # 8 Extended offline Completed: read failure 50% 3440 1001593016 # 9 Extended offline Completed: read failure 50% 2600 1001593016 #10 Extended offline Completed: read failure 50% 1929 1001593016 #11 Extended offline Completed: read failure 50% 1257 1021004240 #12 Extended offline Completed: read failure 50% 585 1001593008 #13 Short offline Completed without error 00% 0 -

Well, you have to do the long test to be sure. Notice that you can do the testing while you use the computer: it just may become sluggish or not respond. Do not power it off if it happens. Of course, the test will take longer if the computer is busy.

Parameter 197.

[snip]

...

Concurrent to this, notice that there are several "extended offline" tests that did not complete, all at the same LBA. I would rewrite that LBA.

You could try to find out to what file does that LBA belong, recover the file if possible or replace with backup copy, and write to that LBA. Not trivial. The write operation should trigger the remap.

Google-fu failing me as to how to go from LBA -> fs file(s). Suggestions?

...

Then run again the long test to see if it stops at another LBA, then repeat till none appears.

You can also run "badblocks" on that disk. This test takes many hours (even days), has to be done while umounted, thus from rescue media. Sometimes this is enough to clear those bad sectors, sometimes they appear again days later. If the command produces a list of bad sectors, then write to them to force a remap.

One method is to rewrite to the entire partition with zeros or whatever, then recover the data from backup.

Thanks much Carlos for the detailed response. Much appreciated. Will try the --test=long tonight and report back. Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Felix Miata

18:12

New subject: [opensuse] antique HDs (was: Login weirdness)

Liam Proven composed on 2018-11-02 18:40 (UTC+0100):

...

Carlos E. R. wrote: ...

...
The first hard disk I bought came with a bad sector list printed in paper by the manufacturer. All disks came with that at that time. It was 30 megabytes big, a huge disk at that time.

...

Well, the first one I used at work, yes. 20 MB and I added a 2nd 15MB drive to the machine for SCO Xenix 286.

...

But times have changed a lot. The last new disk I bought was a 1TB notebook hard disk, a quarter the size of a deck of cards.

This would need a pile of those Conner 225 ST-506 interface drives that I mentioned just now the size of (and a *lot* heavier than) a Space Shuttle to store an equivalent amount of data.

I still have the 5.8lb ST-506 30ms 1024 cyl 8 hd 17 sect Micropolis 85MB model 1335 from the external storage addon for the Altos Xenix 286 where I worked, thinking someday I'd run across a controller to match it to try to retrieve the scripts I wrote. -- Evolution as taught in public schools is religion, not science. Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata *** http://fm.no-ip.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

James Knott

18:20

New subject: [opensuse] antique HDs (was: Login weirdness)

On 11/02/2018 02:12 PM, Felix Miata wrote:

...

...
This would need a pile of those Conner 225 ST-506 interface drives that I mentioned just now the size of (and a *lot* heavier than) a Space Shuttle to store an equivalent amount of data. I still have the 5.8lb ST-506 30ms 1024 cyl 8 hd 17 sect Micropolis 85MB model 1335 from the external storage addon for the Altos Xenix 286 where I worked, thinking someday I'd run across a controller to match it to try to retrieve the scripts I wrote.

My first hard drive was a 30 MB Seagate ST-238R RLL drive that cost, with controller, $500 back in the '80s. I installed it in my XT clone. It was a lot faster than 5.25" floppies! ;-) That drive & controller had 2 cables, one for controlling the drive and the other for data. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Felix Miata

18:57

New subject: [opensuse] antique HDs

James Knott composed on 2018-11-02 14:20 (UTC-0400):

...

Felix Miata wrote:

...

...
...
This would need a pile of those Conner 225 ST-506 interface drives that I mentioned just now the size of (and a *lot* heavier than) a Space Shuttle to store an equivalent amount of data.

...

...
I still have the 5.8lb ST-506 30ms 1024 cyl 8 hd 17 sect Micropolis 85MB model 1335 from the external storage addon for the Altos Xenix 286 where I worked, thinking someday I'd run across a controller to match it to try to retrieve the scripts I wrote.

...

My first hard drive was a 30 MB Seagate ST-238R RLL drive that cost, with controller, $500 back in the '80s. I installed it in my XT clone. It was a lot faster than 5.25" floppies! ;-)

...

That drive & controller had 2 cables, one for controlling the drive and the other for data.

Like the infamous 20MB CMI[1] HD in the PC AT I had before the Altos. I still have the two 5.25" platters (barely over 5.0" actually, roughly 0.5" larger than a DVD) from it. [1] https://www.gillware.com/blog/articles/hdd-burial-sea-davy-jones-locker/ -- Evolution as taught in public schools is religion, not science. Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata *** http://fm.no-ip.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Michael Fischer

20:38

On Fri, Nov 02, Liam Proven wrote:

...

If the OS can see errors, that means that either [a] the disk's replacement blocks are used up, meaning it has millions of bad blocks, or [b] the disk is defective in some other way.

In either instance, I would regard that as a failing drive and replace it immediately.

I'm kind of inclined to go this way, ultimately. I'm coming up on where it is time to upgrade 42.3 to 15 (or maybe Ubuntu 18.04 - been using Ubuntu for everything at work). So this might fit well with "replace the /dev/sda that /home is sitting on". *sigh* Stuff happens. Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Patrick Shanahan

21:55

New subject: [opensuse] antique HDs (was: Login weirdness)

* Felix Miata [11-02-18 14:15]:

...

Liam Proven composed on 2018-11-02 18:40 (UTC+0100):

...
Carlos E. R. wrote: ...

...
The first hard disk I bought came with a bad sector list printed in paper by the manufacturer. All disks came with that at that time. It was 30 megabytes big, a huge disk at that time.

...
Well, the first one I used at work, yes. 20 MB and I added a 2nd 15MB drive to the machine for SCO Xenix 286.

...
But times have changed a lot. The last new disk I bought was a 1TB notebook hard disk, a quarter the size of a deck of cards.

This would need a pile of those Conner 225 ST-506 interface drives that I mentioned just now the size of (and a *lot* heavier than) a Space Shuttle to store an equivalent amount of data.

I still have the 5.8lb ST-506 30ms 1024 cyl 8 hd 17 sect Micropolis 85MB model 1335 from the external storage addon for the Altos Xenix 286 where I worked, thinking someday I'd run across a controller to match it to try to retrieve the scripts I wrote. -- Evolution as taught in public schools is religion, not science.

Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata *** http://fm.no-ip.com/

this thread change belongs in offtopic. it contributes nothing to the proclaimed subject this form is to address. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Registered Linux User #207535 @ http://linuxcounter.net Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet freenode -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Anton Aylward

3 Nov 3 Nov

00:59

New subject: [opensuse] USB tags

On 31/10/18 08:53 AM, Carlos E. R. wrote:

...

On 31/10/2018 13.49, Anton Aylward wrote:

...
On 31/10/18 08:46 AM, James Knott wrote:

...
On 10/31/2018 07:59 AM, Anton Aylward wrote:

...
Now I attach luggage tags to the USB sticks and write on them.

I use key tags.

I tried those for a while. I came to dislike the 'split ring'. The luggage tags give more space to write on :-)

I use card labels with a string attached, bought by the hundred and cheap.

Perhaps we're talking of the same thing. I'll need to get a photo to send you... -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

David Haller

05:25

Hello, On Fri, 02 Nov 2018, Liam Proven wrote:

...

On 02/11/2018 15:57, Carlos E. R. wrote: [..]

...
This parameter says that there are a number of sectors that have not being remapped.

To me, that is a danger sign. I don't know exactly what it means or why but it's worrying.

It means, that the drive was at least _once_ not able to read from that sector (see the list at the end of 'smartctl -a' output). Thus, that sector is *pending* to be reallocated, but is not yet. That happens when you write to that sector. That can be done with hdparm. ==== man hdparm ==== --write-sector Writes zeros to the specified sector number. VERY DANGEROUS. The sector number must be given (base10) after this option. hdparm will issue a low-level write (completely bypassing the usual block layer read/write mechanisms) to the specified sec- tor. This can be used to force a drive to repair a bad sector (media error). ==== AND YES, ALL DATA ON THAT SECTOR WILL BE GONE! But that sector will disappear from the "pending/offline uncorrectable" count but appear as "reallocated" instead (as long as there are sectors left to be remapped to). So you can write to that sector number, but it will (physically) be another sector than originally. E.g.: sector 10 is reported as bad, you have "pending: 1, offline uncorrectable: 1, reallocated: 0 and in the test-log (from a smartctl long test IIRC) shows "LBA_of_first_error: 10". Say then, you use hdparm to write to that sector. Then you'll get: "pending: 0, offline unc.: 0, reallocated: 1". The error-log does not change, but a further test will not fail at sector 10 again, as that has now been reallocated. Besides the reallocated-count going up, the error seems to disappear. Until you run out of sectors to reallocate to and of course, the data that was on unreadable sectors. It might be that the disc can scrape the data from the sector by reading multiple times, but will still mark it as "pending" and reallocate it when written to.

...

...
Concurrent to this, notice that there are several "extended offline" tests that did not complete, all at the same LBA. I would rewrite that LBA. [...]

I am afraid I must disagree.

...
You can also run "badblocks" on that disk[...]

OK, I must disagree more.

Care to elaborate as to why? The following? [..]

...

All hard drives have some bad sectors, it's true. Most develop more during their operational life, also true.

But they have a pretty large reserved area(maybe 10-15%, it varies a lot with model and makers don't like to disclose it. An >1% number of blocks, anyway.) and failed blocks are replaced from the spare blocks.

I doubt that it's that much. Might have been in days long gone.

...

This remapping is normal and invisible. The OS never knows there was a read error, it's just switched on the fly.

See above. Years ago, you could even _hear_ (esp. Seagates) trying to re-re-re-re-re-read a sector.... *gnuuiii*gnuuiii*gnuuiii*gnuuiii*schloink* Taking _minutes_...

...

If the OS can see errors, that means that either [a] the disk's replacement blocks are used up, meaning it has millions of bad blocks, or [b] the disk is defective in some other way.

smartctl != the OS ;)

...

In either instance, I would regard that as a failing drive and replace it immediately.

Agreed. Esp. if the drive is not brand new. Cue the bathtub curve. Some drives may come/develop some badblocks real fast but then be stable over years. But if the drive is older, developing badblocks is a sure warning sign.

...

Don't waste time trying to rescue it. Get any remaining data off it, ASAP. Return it for warranty replacement, if possible. If not, send it for recycling, or take it to bits if you're curious.

Do not waste time trying to fix it, and never use it for anything other than test purposes again.

Or use it for epheremal stuff, news-spool, DL-caches, whatnot where you don't really care if you lose it and/or need to reget it. HTH, -dnh -- Once upon a time there was a DOS user who saw Unix, and saw that it was good. After typing cp on his DOS machine at home, he downloaded GNU's unix tools ported to DOS and installed them. He rm'd, cp'd, and mv'd happily for many days, and upon finding elvis, he vi'd and was happy. After a long day at work (on a Unix box) he came home, started editing a file, and couldn't figure out why he couldn't suspend vi (w/ ctrl-z) to do a compile. (By ewt@tipper.oit.unc.edu (Erik Troan) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

David Haller

05:29

New subject: [opensuse] antique HDs (was: Login weirdness)

Hello, On Fri, 02 Nov 2018, Patrick Shanahan wrote:

...

...
Liam Proven composed on 2018-11-02 18:40 (UTC+0100): [..]

* Felix Miata [11-02-18 14:15]: this thread change belongs in offtopic. it contributes nothing to the proclaimed subject this form is to address.

Calm down, Patrick. Liam just added a nostalgia tidbit to a long mail... And we're way downthread. -dnh -- panic("CPU too expensive - making holiday in the ANDES!"); linux-2.2.16/arch/mips/kernel/traps.c -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Andrei Borzenkov

05:40

02.11.2018 20:40, Liam Proven пишет: ...

...

All hard drives have some bad sectors, it's true. Most develop more during their operational life, also true.

But they have a pretty large reserved area(maybe 10-15%, it varies a lot with model and makers don't like to disclose it. An >1% number of blocks, anyway.) and failed blocks are replaced from the spare blocks.

This remapping is normal and invisible. The OS never knows there was a read error, it's just switched on the fly.

There is no way to correct read errors without active OS involvement. OS must reconstruct failed sector and re-write it; where should drive get correct content to write into remapped sector if drive could not read original sector in the first place? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

12:18

On 02/11/2018 18.40, Liam Proven wrote:

...

On 02/11/2018 15:57, Carlos E. R. wrote:

...
All hard disks develop errors. Operating systems know that, and can mark the bad sectors in order to just not use them. Modern (since years) disks can remap bad sectors to other sectors that are reserved for that purpose since manufacture date. This is done automatically by the firmware when writing to that bad sector.

This is true. All SATA hard disks do this, and all later *E*IDE hard disks did to too.

(SSDs are different and weird.)

...
This parameter says that there are a number of sectors that have not being remapped.

To me, that is a danger sign. I don't know exactly what it means or why but it's worrying.

...
Concurrent to this, notice that there are several "extended offline" tests that did not complete, all at the same LBA. I would rewrite that LBA. [...]

I am afraid I must disagree.

...
You can also run "badblocks" on that disk[...]

OK, I must disagree more.

...
The first hard disk I bought came with a bad sector list printed in paper by the manufacturer. All disks came with that at that time. It was 30 megabytes big, a huge disk at that time.

Well, the first one I used at work, yes. 20 MB and I added a 2nd 15MB drive to the machine for SCO Xenix 286.

But times have changed a lot. The last new disk I bought was a 1TB notebook hard disk, a quarter the size of a deck of cards.

This would need a pile of those Conner 225 ST-506 interface drives that I mentioned just now the size of (and a *lot* heavier than) a Space Shuttle to store an equivalent amount of data.

This shook me at the time.

All hard drives have some bad sectors, it's true. Most develop more during their operational life, also true.

But they have a pretty large reserved area(maybe 10-15%, it varies a lot with model and makers don't like to disclose it. An >1% number of blocks, anyway.) and failed blocks are replaced from the spare blocks.

This remapping is normal and invisible. The OS never knows there was a read error, it's just switched on the fly.

This is what I have been saying :-)

...

If the OS can see errors, that means that either [a] the disk's replacement blocks are used up, meaning it has millions of bad blocks, or [b] the disk is defective in some other way.

Well, no. :-) This parameter:

...

5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0

is precisely how much of that area has been used. "100" means empty, "10" means "emergency", replace ASAP. The numbering in SMART goes down. The "raw" number is "0", which is, we suppose, the normal number humans would use. *The remapping only happens during sector write.* If the sector belongs to a file which is never written, the error remains for ever, not mapped out. That is the main reason that the count "Current_Pending_Sector" goes up. Which is why the user most force a write to the bad LBA to make it being mapped out. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Carlos E. R.

12:38

On 03/11/2018 06.25, David Haller wrote:

...

Hello,

On Fri, 02 Nov 2018, Liam Proven wrote:

...
On 02/11/2018 15:57, Carlos E. R. wrote: [..]

...
This parameter says that there are a number of sectors that have not being remapped.

To me, that is a danger sign. I don't know exactly what it means or why but it's worrying.

It means, that the drive was at least _once_ not able to read from that sector (see the list at the end of 'smartctl -a' output). Thus, that sector is *pending* to be reallocated, but is not yet. That happens when you write to that sector. That can be done with hdparm.

==== man hdparm ==== --write-sector Writes zeros to the specified sector number. VERY DANGEROUS. The sector number must be given (base10) after this option. hdparm will issue a low-level write (completely bypassing the usual block layer read/write mechanisms) to the specified sec- tor. This can be used to force a drive to repair a bad sector (media error). ====

AND YES, ALL DATA ON THAT SECTOR WILL BE GONE!

But that sector will disappear from the "pending/offline uncorrectable" count but appear as "reallocated" instead (as long as there are sectors left to be remapped to).

So you can write to that sector number, but it will (physically) be another sector than originally.

E.g.: sector 10 is reported as bad, you have "pending: 1, offline uncorrectable: 1, reallocated: 0 and in the test-log (from a smartctl long test IIRC) shows "LBA_of_first_error: 10". Say then, you use hdparm to write to that sector. Then you'll get:

"pending: 0, offline unc.: 0, reallocated: 1".

Yes, this is what I was suggesting.

...

The error-log does not change, but a further test will not fail at sector 10 again, as that has now been reallocated.

Besides the reallocated-count going up, the error seems to disappear. Until you run out of sectors to reallocate to and of course, the data that was on unreadable sectors.

It might be that the disc can scrape the data from the sector by reading multiple times, but will still mark it as "pending" and reallocate it when written to.

I have a disk (replaced) that had such bad sectors, but SMART would not say which LBA. So I had to run badblocks... which somehow found nothing. And a run of the long smart test would then also find nothing, for some days... then they would appear again. So I wrote the entire disk with zeroes. Some time later other sectors would appear. Repeat. Then I noticed the remapped count going up and up on each test... and I finally decided to replace the disk.

...

...
...
Concurrent to this, notice that there are several "extended offline" tests that did not complete, all at the same LBA. I would rewrite that LBA. [...]

I am afraid I must disagree.

...
You can also run "badblocks" on that disk[...]

OK, I must disagree more.

Care to elaborate as to why? The following?

[..]

...
All hard drives have some bad sectors, it's true. Most develop more during their operational life, also true.

But they have a pretty large reserved area(maybe 10-15%, it varies a lot with model and makers don't like to disclose it. An >1% number of blocks, anyway.) and failed blocks are replaced from the spare blocks.

I doubt that it's that much. Might have been in days long gone.

...
This remapping is normal and invisible. The OS never knows there was a read error, it's just switched on the fly.

See above.

Years ago, you could even _hear_ (esp. Seagates) trying to re-re-re-re-re-read a sector.... *gnuuiii*gnuuiii*gnuuiii*gnuuiii*schloink*

Taking _minutes_...

Yes. But if it was doing a "write", after the firmware (not the os) decides the sector is bad, it remaps it, and writes the waiting data on the new sector instead. The operating system knows nothing, only that the disk took way longer than usual. From that point on, writes to that sector will be fast as usual - except that the block is not contiguous with the rest, needing one head movement and one back. Taking milliseconds. The "minutes" part was because the operating system tried 10 times on disk error. I think it was 10.

...

...
If the OS can see errors, that means that either [a] the disk's replacement blocks are used up, meaning it has millions of bad blocks, or [b] the disk is defective in some other way.

smartctl != the OS ;)

...
In either instance, I would regard that as a failing drive and replace it immediately.

Agreed. Esp. if the drive is not brand new. Cue the bathtub curve.

Some drives may come/develop some badblocks real fast but then be stable over years. But if the drive is older, developing badblocks is a sure warning sign.

...
Don't waste time trying to rescue it. Get any remaining data off it, ASAP. Return it for warranty replacement, if possible. If not, send it for recycling, or take it to bits if you're curious.

Do not waste time trying to fix it, and never use it for anything other than test purposes again.

Or use it for epheremal stuff, news-spool, DL-caches, whatnot where you don't really care if you lose it and/or need to reget it.

I never discard a disk "fast". I give them at least a second chance. If they don't develop more bad sectors, they stay in place. So far, I have not lost data that way, in some decades... :-) Of course, that was luck. A single transient bad sector may destroy an important file. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Carlos E. R.

13:13

On 02/11/2018 18.57, Michael Fischer wrote:

...

On Fri, Nov 02, Carlos E. R. wrote:

...
On 02/11/2018 15.27, Michael Fischer wrote:

...
Ah, yes. Better run the test on all disks.

The ssd produced much less output from `smartctl -a` but also nothing which suggested errors (good, as that is /)

I've got 2 external (usb-attached) drives which are my backups.

smartctl need a `-d sat` to produce output from one of them (happy) and `-d scsi` for the other, which insisted that

SMART support is: Available - device has SMART capability. SMART support is: Disabled

I did `$ sudo smartctl -d scsi -s on /dev/sdb` but to no effect in the output of `$ sudo smartctl -i -d scsi /dev/sdb`

Go figure. AFAIK, both those external disks are fine, but running badblocks on them now for "grins".

USB disks are problematic with smart, the box firmware interferes. If they are recent, the program doesn't always know how to access them. I use "-d sat,12" on mine.

...

...
Concurrent to this, notice that there are several "extended offline" tests that did not complete, all at the same LBA. I would rewrite that LBA.

You could try to find out to what file does that LBA belong, recover the file if possible or replace with backup copy, and write to that LBA. Not trivial. The write operation should trigger the remap.

Google-fu failing me as to how to go from LBA -> fs file(s). Suggestions?

Not trivial was an understatement on my part :-( It is filesystem dependent. I don't have a rule of thumb to do it always. From the LBA and the partition table you can find out the partition involved. The next step is to find out the sector inside that partition, doing some math, and then, find out the file, which usually requires going through the entire list of files, the location of each file, and compare with the target sector. Hopefully there is a tool, specific to the filesystem, that does it. Yes, there are google articles on it I found at some point, I should have taken notes. Hum... where... Sometimes I'm fortunate. I have a note I wrote describing the procedure, but the LBA was on the SWAP, so I overwrote it entirely and done.

...

<3.2> 2016-09-19 13:16:21 Telcontar smartd 1161 - - Device: /dev/sda [SAT], 8 Offline uncorrectable sectors <3.2> 2016-09-19 13:46:21 Telcontar smartd 1161 - - Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors <3.2> 2016-09-19 13:46:21 Telcontar smartd 1161 - - Device: /dev/sda [SAT], 8 Offline uncorrectable sectors <3.2> 2016-09-19 13:46:21 Telcontar smartd 1161 - - Device: /dev/sda [SAT], previous self-test completed with error (read test element) <3.2> 2016-09-19 13:46:21 Telcontar smartd 1161 - - Device: /dev/sda [SAT], Self-Test Log error count increased from 0 to 1

SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 2116 47894552 # 2 Short offline Completed without error 00% 2115 - # 3 Short offline Completed without error 00% 2108 -

...

Telcontar:/etc # fdisk -l /dev/sda WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk label type: gpt

# Start End Size Type Name 1 2048 16383 7M BIOS boot parti primary 2 16384 41961471 20G Microsoft basic primary 3 41961472 73416703 15G Microsoft basic primary <==== 4 73416704 75522047 1G Microsoft basic primary 5 75522048 77625343 1G Microsoft basic primary

...

Telcontar:/etc # lsblk --output NAME,KNAME,RA,RM,RO,SIZE,TYPE,FSTYPE,LABEL,PARTLABEL,MOUNTPOINT,UUID,PARTUUID,WWN,MODEL,ALIGNMENT /dev/sda | grep sda3 ├─sda3 sda3 512 0 0 15G part swap Swap_0 primary [SWAP] 1cb5f0b4-d92a-4248-926c-0828c1f7eb48 d67674b0-b4d1-4adf-8b3e-e7cdb00703cf 0 Telcontar:/etc #

So swap_0, sda3. Here is an article for reiserfs, taken from another of my notes: http://smartmontools.sourceforge.net/badblockhowto.html#reiserfs_ex There must be more info in that howto, have a look at it.

...

...
Then run again the long test to see if it stops at another LBA, then repeat till none appears.

You can also run "badblocks" on that disk. This test takes many hours (even days), has to be done while umounted, thus from rescue media. Sometimes this is enough to clear those bad sectors, sometimes they appear again days later. If the command produces a list of bad sectors, then write to them to force a remap.

One method is to rewrite to the entire partition with zeros or whatever, then recover the data from backup.

Thanks much Carlos for the detailed response. Much appreciated.

Will try the --test=long tonight and report back.

Welcome :-) -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Carlos E. R.

13:21

New subject: [opensuse] USB tags

On 03/11/2018 01.59, Anton Aylward wrote:

...

On 31/10/18 08:53 AM, Carlos E. R. wrote:

...
On 31/10/2018 13.49, Anton Aylward wrote:

...
On 31/10/18 08:46 AM, James Knott wrote:

...
On 10/31/2018 07:59 AM, Anton Aylward wrote:

...
Now I attach luggage tags to the USB sticks and write on them.

I use key tags.

I tried those for a while. I came to dislike the 'split ring'. The luggage tags give more space to write on :-)

I use card labels with a string attached, bought by the hundred and cheap.

Perhaps we're talking of the same thing. I'll need to get a photo to send you...

https://www.amazon.es/APLI-390-etiquetas-colgantes-blanco/dp/B001AO2OVI/ref=sr_1_1?ie=UTF8&qid=1541251064&sr=8-1&keywords=etiquetas+colgantes+apli Only that my pack has (had) 100 items, not 500. Seems an older reference (ref 7011). -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Michael Fischer

4 Nov 4 Nov

16:37

On Sat, Nov 03, Carlos E. R. wrote:

...

From the LBA and the partition table you can find out the partition involved. The next step is to find out the sector inside that partition, doing some math, and then, find out the file, which usually requires going through the entire list of files, the location of each file, and compare with the target sector. Hopefully there is a tool, specific to the filesystem, that does it.

Yes, there are google articles on it I found at some point, I should have taken notes. Hum... where...

Sometimes I'm fortunate. I have a note I wrote describing the procedure, but the LBA was on the SWAP, so I overwrote it entirely and done.

...

...
Telcontar:/etc # fdisk -l /dev/sda WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes, 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk label type: gpt

# Start End Size Type Name 1 2048 16383 7M BIOS boot parti primary 2 16384 41961471 20G Microsoft basic primary 3 41961472 73416703 15G Microsoft basic primary <==== 4 73416704 75522047 1G Microsoft basic primary 5 75522048 77625343 1G Microsoft basic primary

...
Telcontar:/etc # lsblk --output NAME,KNAME,RA,RM,RO,SIZE,TYPE,FSTYPE,LABEL,PARTLABEL,MOUNTPOINT,UUID,PARTUUID,WWN,MODEL,ALIGNMENT /dev/sda | grep sda3 ├─sda3 sda3 512 0 0 15G part swap Swap_0 primary [SWAP] 1cb5f0b4-d92a-4248-926c-0828c1f7eb48 d67674b0-b4d1-4adf-8b3e-e7cdb00703cf 0 Telcontar:/etc #

I repeated the method you describe above (amusingly, also ended up being /dev/sda3) ...which unfortunately turned out to be /home... also made a notes-to-self of the above routine (h/t). In the meantime, I decided (perhaps wisely, given what I just found ^^^ here), to purchase a replacement (EVO 860 500G) and just install that as a new /home. Interestingly, I've not hit any READ|WRITE errors on any files in /home in daily usage, so ... who knows which files may be affected. Perhaps I can keep the old disk in there as some `data` partition(s), and it might make the restore of /home easier (though I run double backup external drives... *shrug*) In the interest of completeness: `smartctl --test=long` results below. (Thanks again for all the tips) smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.6-1.g45f120a-default] (SUSE RPM) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST1000DM003-1SB102 Serial Number: Z9ACC0MX LU WWN Device Id: 5 000c50 0a28684ad Firmware Version: CC43 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Nov 4 11:20:46 2018 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 41) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 106) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 079 063 006 Pre-fail Always - 89280056 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 46 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 078 060 045 Pre-fail Always - 59171948 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7817 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 41 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 099 000 Old_age Always - 2 2 2 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 063 050 040 Old_age Always - 37 (Min/Max 30/41) 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 365 194 Temperature_Celsius 0x0022 037 016 000 Old_age Always - 37 (0 16 0 0 0) 195 Hardware_ECC_Recovered 0x001a 009 004 000 Old_age Always - 89280056 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 16 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 16 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 7780h+54m+58.740s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 6394009801 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 373281268 SMART Error Log Version: 1 ATA Error Count: 2 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 occurred at disk power-on lifetime: 7815 hours (325 days + 15 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 00 00 00 00 00 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 00 00 00 00 00 00 00 ff 4d+19:04:57.711 NOP [Abort queued commands] b0 d4 00 82 4f c2 00 00 4d+19:04:36.901 SMART EXECUTE OFF-LINE IMMEDIATE b0 d0 01 00 4f c2 00 00 4d+19:04:36.805 SMART READ DATA ec 00 01 00 00 00 00 00 4d+19:04:36.800 IDENTIFY DEVICE ec 00 01 00 00 00 00 00 4d+19:04:36.798 IDENTIFY DEVICE Error 1 occurred at disk power-on lifetime: 7752 hours (323 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 00 00 00 00 00 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 00 00 00 00 00 00 00 ff 2d+03:57:39.774 NOP [Abort queued commands] b0 d4 00 81 4f c2 00 00 2d+03:57:18.107 SMART EXECUTE OFF-LINE IMMEDIATE b0 d0 01 00 4f c2 00 00 2d+03:57:18.000 SMART READ DATA ec 00 01 00 00 00 00 00 2d+03:57:17.991 IDENTIFY DEVICE ec 00 01 00 00 00 00 00 2d+03:57:17.990 IDENTIFY DEVICE SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended captive Interrupted (host reset) 90% 7815 - # 2 Extended offline Completed: read failure 50% 7808 1001593016 # 3 Extended offline Completed: read failure 50% 7807 1001593016 # 4 Extended offline Completed: read failure 50% 7804 1001593016 # 5 Extended offline Aborted by host 90% 7803 - # 6 Short captive Interrupted (host reset) 60% 7752 - # 7 Short offline Completed without error 00% 7752 - # 8 Extended offline Completed: read failure 50% 7136 1001593016 # 9 Extended offline Completed: read failure 50% 6296 1001593016 #10 Extended offline Completed: read failure 50% 5624 1001593016 #11 Extended offline Completed: read failure 50% 4784 1001593016 #12 Extended offline Completed: read failure 50% 4112 1001593016 #13 Extended offline Completed: read failure 50% 3440 1001593016 #14 Extended offline Completed: read failure 50% 2600 1001593016 #15 Extended offline Completed: read failure 50% 1929 1001593016 #16 Extended offline Completed: read failure 50% 1257 1021004240 #17 Extended offline Completed: read failure 50% 585 1001593008 #18 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Liam Proven

5 Nov 5 Nov

10:10

New subject: [opensuse] antique HDs (was: Login weirdness)

On 02/11/2018 19:12, Felix Miata wrote:

...

I still have the 5.8lb ST-506 30ms 1024 cyl 8 hd 17 sect Micropolis 85MB model 1335 from the external storage addon for the Altos Xenix 286 where I worked, thinking someday I'd run across a controller to match it to try to retrieve the scripts I wrote.

Wow! I managed to sell my old ESDI-interface 5¼" FH drives, about a decade back. I think they went into a DEC PDP-11. I might be able to connect you to some people who could help you to read that, if you like. Mail me offlist? -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Liam Proven

11:09

New subject: [opensuse] antique HDs (was: Login weirdness)

On 02/11/2018 19:20, James Knott wrote:

...

That drive & controller had 2 cables, one for controlling the drive and the other for data.

Yes, that was normal back then. The evolution was something like: ST-506 (5 Mb/s; 34-pin control cable, 20-pin data cable, MFM encoding) The biggest drive I ever heard of was a 3½" Conner unit that did 88 MB. I coveted one for my Acorn Archimedes but couldn't afford it. → "RLL" (7½ Mb/s; same interface, higher-density encoding, not compatible but *might* work, sometimes briefly) → ESDI (10/15/20 Mb/s; same cabling, different signals -- more "brains" on the drive, less on the controller card) → IDE (3.3 M*B*/s; single 40-pin cable; all controller electronics on the drive; max drive size 2GB.) Then there were various iterations of IDE until SATA came along. IDE-2 (or ATA-1; 1993) allowed DMA for 13.3 MB/s and 128 GB drives - 8 GB without LBA. This is when non-hard-disks first got attached to the interface. CD-ROMs at first, mainly. Later, tape drives, very high-density "superfloppy" drives. EIDE (or ATA-2 or Ultra-ATA; 1994) 16.6 MB/s ATA-3 brought in notebook-sized drives. ATA-4 (Ultra ATA/33, 1997) 33 MB/s ATA-5 (Ultra ATA/66, 1999) 66 MB/s over 80-wire cables ATA-6 (Ultra ATA/100, 2000) 100 MB/s The last I saw was Ultra ATA/133, but apparently there was an Ultra ATA/167 standard as well. In parallel (ha ha) to all this was SCSI development, always a bit faster and always quite a lot more expensive. SCSI allowed 8 devices per controller, rather than 2, needed termination, and was a nightmare to troubleshoot, but far quicker and more versatile when it worked. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

14:26

On 04/11/2018 17.37, Michael Fischer wrote:

...

On Sat, Nov 03, Carlos E. R. wrote:

...

I repeated the method you describe above (amusingly, also ended up being /dev/sda3) ...which unfortunately turned out to be /home... also made a notes-to-self of the above routine (h/t).

Yes, /home is unfortunate.

...

In the meantime, I decided (perhaps wisely, given what I just found ^^^ here), to purchase a replacement (EVO 860 500G) and just install that as a new /home.

Yes, that is sensible

...

Interestingly, I've not hit any READ|WRITE errors on any files in /home in daily usage, so ... who knows which files may be affected.

Maybe none.

...

Perhaps I can keep the old disk in there as some `data` partition(s), and it might make the restore of /home easier (though I run double backup external drives... *shrug*)

see later.

...

In the interest of completeness: `smartctl --test=long` results below.

(Thanks again for all the tips)

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.6-1.g45f120a-default] (SUSE RPM) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST1000DM003-1SB102 Serial Number: Z9ACC0MX LU WWN Device Id: 5 000c50 0a28684ad Firmware Version: CC43 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Nov 4 11:20:46 2018 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ...

...

SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ... 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 16 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 16 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 7780h+54m+58.740s ...

...

SMART Self-test Num Test_Description # 1 Extended captive # 2 Extended offline # 3 Extended offline # 4 Extended offline # 5 Extended offline # 6 Short captive # 7 Short offline # 8 Extended offline # 9 Extended offline #10 Extended offline #11 Extended offline #12 Extended offline #13 Extended offline #14 Extended offline #15 Extended offline #16 Extended offline #17 Extended offline #18 Short offline log structure revision number 1 Status Remaining LifeTime(hours) LBA_of_first_error Interrupted (host reset) 90% 7815 - Completed: read failure 50% 7808 1001593016 Completed: read failure 50% 7807 1001593016 Completed: read failure 50% 7804 1001593016 Aborted by host 90% 7803 - Interrupted (host reset) 60% 7752 - Completed without error 00% 7752 - Completed: read failure 50% 7136 1001593016 Completed: read failure 50% 6296 1001593016 Completed: read failure 50% 5624 1001593016 Completed: read failure 50% 4784 1001593016 Completed: read failure 50% 4112 1001593016 Completed: read failure 50% 3440 1001593016 Completed: read failure 50% 2600 1001593016 Completed: read failure 50% 1929 1001593016 Completed: read failure 50% 1257 1021004240 Completed: read failure 50% 585 1001593008 Completed without error 00% 0 -

Well, you see there is always an error at the same LBA and that the testing stops, never completes. I would backup the files in that partition, then write over that LBA sector as the howto I linked I think explains. Then retest. If another sector then appears, you can repeat the procedure, or overwrite the entire partition. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Liam Proven

17:28

On 04/11/2018 17:37, Michael Fischer wrote:

...

In the meantime, I decided (perhaps wisely, given what I just found ^^^ here), to purchase a replacement (EVO 860 500G) and just install that as a new /home.

Question. That replacement model number looks like an SSD. Is the old drive an SSD? This is just IMHO and doubtless others will disagree, but on my own machines, if I have space for both, I put the OS on an SSD and /home on a conventional spinning hard disk. I also use a non-journaling filesystem (usually ext2) and disable atime to reduce the disk writes to the SSD. This gives the ideal combination of performance and longevity. OS binary files are often big, thus the SSD's speed significantly reduces the loading times, therefore increasing performance. Config files, in /home/$mydir, are generally pretty small, so they read quickly anyway. There are _relatively_ few writes to the OS partition, where I don't want to wear out the SSD, but lots of writes to the stuff in the home directory -- where the HD's longer lifespan benefits me. I find little to no perceptible performance difference between a machine with OS on SSD and /home on HD, and a machine with everything on SSD. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

17:41

On 05/11/2018 18.28, Liam Proven wrote:

...

On 04/11/2018 17:37, Michael Fischer wrote:

...
In the meantime, I decided (perhaps wisely, given what I just found ^^^ here), to purchase a replacement (EVO 860 500G) and just install that as a new /home.

Question.

That replacement model number looks like an SSD.

Is the old drive an SSD?

No, based on the smartctl output: === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST1000DM003-1SB102 Serial Number: LU WWN Device Id: 5 000c50 0a28684ad Firmware Version: CC43 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches

...

This is just IMHO and doubtless others will disagree, but on my own machines, if I have space for both, I put the OS on an SSD and /home on a conventional spinning hard disk.

I also use a non-journaling filesystem (usually ext2) and disable atime to reduce the disk writes to the SSD.

I use "lazytime" mount option. It may become a default option.

...

This gives the ideal combination of performance and longevity.

OS binary files are often big, thus the SSD's speed significantly reduces the loading times, therefore increasing performance.

Config files, in /home/$mydir, are generally pretty small, so they read quickly anyway.

But they seek faster on SSD :-)

...

There are _relatively_ few writes to the OS partition, where I don't want to wear out the SSD, but lots of writes to the stuff in the home directory -- where the HD's longer lifespan benefits me.

I find little to no perceptible performance difference between a machine with OS on SSD and /home on HD, and a machine with everything on SSD.

I have a machine that needs more RAM, but the board can't take more. Placing the swap on SSD gave the machine a huge speed difference. Like this laptop: has 4 Gigs, and is currently using 1.6 of swap. I'm only running XFCE, Thunderbird and Firefox. Ah, I forgot clamd, takes half a gig, and I'm only using my trick on the desktop, is not automated yet. I must go back to it. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Liam Proven

18:10

On 05/11/2018 18:41, Carlos E. R. wrote:

...

No, based on the smartctl output:

=== START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF)

Ah, OK. I forgot that was there to look back at -- sorry.

...

I use "lazytime" mount option. It may become a default option.

I will check that out.

...

But they seek faster on SSD :-)

True, but I can't tell any difference.

...

I have a machine that needs more RAM, but the board can't take more. Placing the swap on SSD gave the machine a huge speed difference.

Urgh. I try to avoid swap on SSD. My home laptop has 2 SSDs -- a big one with Win10 and a shared data partition, and a small one with Linux. I use the "zram" tool for dynamic compressed swapfiles in RAM for that, to avoid swapping to disk and wearing it out. It rarely hits swap at all.

...

Like this laptop: has 4 Gigs, and is currently using 1.6 of swap. I'm only running XFCE, Thunderbird and Firefox. Ah, I forgot clamd, takes half a gig, and I'm only using my trick on the desktop, is not automated yet. I must go back to it.

I do still use a 4GB laptop, but mainly for experimenting with PC DOS 7.1, Haiku and Oberon. None of them need all the RAM. I also have Devuan, though, with XFCE, but that's pretty light and the OS does not do much hard work. I see your predicament there -- I have an occasionally-used machine that is maxed out at 8GB and it really wants more -- but I'm not sure I'd burn the life of an SSD for it. You might find ZRAM useful. And if it were me, I would lose clamd in that config. You are not going to catch anything on Linux. I don't know how to rephrase "caveat emptor" to mean "Windows users beware" but you get what I mean, I'm sure. https://forums.opensuse.org/showthread.php/501160-Using-zram-amp-zswap -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

18:38

On 05/11/2018 19.10, Liam Proven wrote:

...

On 05/11/2018 18:41, Carlos E. R. wrote:

...
No, based on the smartctl output:

=== START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF)

Ah, OK. I forgot that was there to look back at -- sorry.

...
I use "lazytime" mount option. It may become a default option.

I will check that out.

...
But they seek faster on SSD :-)

True, but I can't tell any difference.

Depends on the application.

...

...
I have a machine that needs more RAM, but the board can't take more. Placing the swap on SSD gave the machine a huge speed difference.

Urgh. I try to avoid swap on SSD. My home laptop has 2 SSDs -- a big one with Win10 and a shared data partition, and a small one with Linux. I use the "zram" tool for dynamic compressed swapfiles in RAM for that, to avoid swapping to disk and wearing it out. It rarely hits swap at all.

Yes, it will wear up faster, but the alternatives are slow and get on my nerves... this way I delay purchasing a new computer. Mind: since some kernel version swap became very slow. I think it happened when going from 13.1 to leap 42.2. I suspect that swap is "fragmented" (i/o was just few megabytes, while the disk is capable of 150MB/S). Thus switching it to SSD made instant difference: the read speed is faster (although I have sata 3, not 6), but the seek speed goes wonderfully up.

...

...
Like this laptop: has 4 Gigs, and is currently using 1.6 of swap. I'm only running XFCE, Thunderbird and Firefox. Ah, I forgot clamd, takes half a gig, and I'm only using my trick on the desktop, is not automated yet. I must go back to it.

I do still use a 4GB laptop, but mainly for experimenting with PC DOS 7.1, Haiku and Oberon. None of them need all the RAM.

I also have Devuan, though, with XFCE, but that's pretty light and the OS does not do much hard work.

I see your predicament there -- I have an occasionally-used machine that is maxed out at 8GB and it really wants more -- but I'm not sure I'd burn the life of an SSD for it. You might find ZRAM useful.

I considered it.

...

And if it were me, I would lose clamd in that config. You are not going to catch anything on Linux. I don't know how to rephrase "caveat emptor" to mean "Windows users beware" but you get what I mean, I'm sure.

https://forums.opensuse.org/showthread.php/501160-Using-zram-amp-zswap

At home, I may move amavis and clamav to another machine. I use clamav because I like finding out if I am sent some virus garbage, mostly intended for Windows, of course. Unfortunately, if clamav is installed for manually scanning some file, amavis insists on using it automatically. Thus the clamav daemon has to be running, and it is a bad design, IMO. It keeps the data in memory even if not used for days. The trick I was talking about is using cgroups to increase swapines of a single process to 100. This does work, the next step is to do it automatically via systemd init file. But I'm busy with upgrading machines to 15.0 -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Michael Fischer

20:37

On Mon, Nov 05, Liam Proven wrote:

...

On 04/11/2018 17:37, Michael Fischer wrote:

...
In the meantime, I decided (perhaps wisely, given what I just found ^^^ here), to purchase a replacement (EVO 860 500G) and just install that as a new /home.

Question.

That replacement model number looks like an SSD.

Is the old drive an SSD?

This is just IMHO and doubtless others will disagree, but on my own machines, if I have space for both, I put the OS on an SSD and /home on a conventional spinning hard disk.

That's the current setup, for exactly the reasons you gave. A friend who is more current with hardware convinced me to try an EVO for /home. / will continue to live on /dev/nvme0n1p2, again for the same logic you gave. Got my new breaker box today, and once again, still have the actual login weirdness. (back to original topic?) One thing to note, with the first (non-working) login prompt, the username is NOT echoed back to the screen. I just keep hitting ENTER until I get a `login: ` which echoes back - then it works. I wonder if some systemd thread isn't really finished at the point where it starts throwing these "give root password for maintenance" things. One workplace acquaintance remarked that it was as if the system was confused into thinking it was in runlevel 1 (to use the old terminology). I find the following in journalctl for today's reboot: note the timestamps. Nov 05 15:05:47 blinkenlights login[925]: FAILED LOGIN 2 FROM tty1 FOR (unknown), User not known to the underlying Nov 05 15:05:49 blinkenlights login[925]: gkr-pam: error looking up user information ... ... Nov 05 15:06:01 blinkenlights systemd[1]: getty@tty1.service: Service has no hold-off time, scheduling restart. Nov 05 15:06:01 blinkenlights systemd[1]: Stopped Getty on tty1. Nov 05 15:06:01 blinkenlights systemd[1]: Started Getty on tty1. ... ... Nov 05 15:06:08 blinkenlights login[8706]: pam_unix(login:session): session opened for user xxxxx by LOGIN(uid=0 Nov 05 15:06:08 blinkenlights systemd[1]: Created slice User Slice of xxxx. Nov 05 15:06:08 blinkenlights systemd[1]: Starting User Manager for UID 1000... Nov 05 15:06:08 blinkenlights systemd-logind[899]: New session 1 of user xxxx. Nov 05 15:06:08 blinkenlights systemd[1]: Started Session 1 of user xxxx. ... ... Nov 05 15:06:08 blinkenlights systemd[9280]: Reached target Default. Nov 05 15:06:08 blinkenlights systemd[9280]: Startup finished in 35ms. Seems like it is trying to prompt me to log in well before the system is properly up? TIA Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

20:48

On 05/11/2018 21.37, Michael Fischer wrote:

...

On Mon, Nov 05, Liam Proven wrote:

...

Seems like it is trying to prompt me to log in well before the system is properly up?

That is possible. Indeed, I suspect one of my machines has that issue as well. The symptoms are different, though: the XFCE session does not recover all items that were up on the previous session. So what I do is log out without saving the session. Or crash the session (ctrl-alt-bckspace twice), then login again. Alternatively, logi first as a different user in gnome, exit, then as my user in xfce. Then I get all the items back. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Michael Fischer

21:13

On Mon, Nov 05, Carlos E. R. wrote:

...

On 05/11/2018 21.37, Michael Fischer wrote:

...
On Mon, Nov 05, Liam Proven wrote:

...

...
Seems like it is trying to prompt me to log in well before the system is properly up?

That is possible. Indeed, I suspect one of my machines has that issue as well. The symptoms are different, though: the XFCE session does not recover all items that were up on the previous session.

So what I do is log out without saving the session. Or crash the session (ctrl-alt-bckspace twice), then login again. Alternatively, logi first as a different user in gnome, exit, then as my user in xfce. Then I get all the items back.

Hmm. As you might have guessed from my other thread, I'm using "runlevel 3" and `startx`. (Here's to forced debugging using those..) Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

6 Nov 6 Nov

00:29

On 05/11/2018 22.13, Michael Fischer wrote:

...

On Mon, Nov 05, Carlos E. R. wrote:

...
On 05/11/2018 21.37, Michael Fischer wrote:

...
On Mon, Nov 05, Liam Proven wrote:

...

...
Seems like it is trying to prompt me to log in well before the system is properly up?

That is possible. Indeed, I suspect one of my machines has that issue as well. The symptoms are different, though: the XFCE session does not recover all items that were up on the previous session.

So what I do is log out without saving the session. Or crash the session (ctrl-alt-bckspace twice), then login again. Alternatively, logi first as a different user in gnome, exit, then as my user in xfce. Then I get all the items back.

Hmm. As you might have guessed from my other thread, I'm using "runlevel 3" and `startx`. (Here's to forced debugging using those..)

Ah, yes, startx. I forgot. :-) And you have problems login in there, for a while, then it works? Very strange. And you have already replaced the /home disk? The old disk is disconnected? I mean, the data cable removed? Play with "systemd-analyze". Tab tab for options. Start with "blame". Then "critical-chain". You may see what service, if any, is taking long to start. In Critical chain the '+' indicates how long a service takes to complete start; and when this value does appear, it is that it takes long. Those lines may show in red. The multiuser target is started early, but many other services are started later and take long to complete. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Knurpht-openSUSE

01:04

...

On 05/11/2018 22.13, Michael Fischer wrote:

...
On Mon, Nov 05, Carlos E. R. wrote:

...
On 05/11/2018 21.37, Michael Fischer wrote:

...
On Mon, Nov 05, Liam Proven wrote: ...

...
Seems like it is trying to prompt me to log in well before the system is properly up?

That is possible. Indeed, I suspect one of my machines has that issue as well. The symptoms are different, though: the XFCE session does not recover all items that were up on the previous session.

So what I do is log out without saving the session. Or crash the session (ctrl-alt-bckspace twice), then login again. Alternatively, logi first as a different user in gnome, exit, then as my user in xfce. Then I get all the items back.

Hmm. As you might have guessed from my other thread, I'm using "runlevel 3" and `startx`. (Here's to forced debugging using those..)

Ah, yes, startx. I forgot. :-)

And you have problems login in there, for a while, then it works? Very strange.

And you have already replaced the /home disk? The old disk is disconnected? I mean, the data cable removed?

Play with "systemd-analyze". Tab tab for options. Start with "blame". Then "critical-chain". You may see what service, if any, is taking long to start.

In Critical chain the '+' indicates how long a service takes to complete start; and when this value does appear, it is that it takes long. Those lines may show in red.

The multiuser target is started early, but many other services are started later and take long to complete. Have a look at this ( which is the same for Leap 15 and many other distros re.

Op dinsdag 6 november 2018 01:29:02 CET schreef Carlos E. R.: the setuid of the X server ) knurpht@Knurpht-HP:~> startx hostname: Name or service not known xauth: file /home/knurpht/.serverauth.27924 does not exist X.Org X Server 1.20.2 X Protocol Version 11, Revision 0 Build Operating System: openSUSE SUSE LINUX Current Operating System: Linux Knurpht-HP 4.18.15-1-default #1 SMP PREEMPT Thu Oct 18 08:56:17 UTC 2018 (5a53676) x86_64 Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.18.15-1-default root=UUID=11929e10-a254-4faa-b564-fd6b9511e847 splash=silent resume=/dev/disk/ by-uuid/3aa43b47-436c-4c56-a6e0-c07216d8ad6f quiet Build Date: 15 October 2018 12:00:00PM Current version of pixman: 0.34.0 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/home/knurpht/.local/share/xorg/Xorg.1.log", Time: Tue Nov 6 02:00:35 2018 (==) Using config directory: "/etc/X11/xorg.conf.d" (==) Using system config directory "/usr/share/X11/xorg.conf.d" (EE) Fatal server error: (EE) parse_vt_settings: Cannot open /dev/tty0 (Permission denied) (EE) (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. (EE) Please also check the log file at "/home/knurpht/.local/share/xorg/Xorg. 1.log" for additional information. (EE) VGA Arbitration: Cannot restore default device. (EE) Server terminated with error (1). Closing log file. xinit: giving up xinit: unable to connect to X server: Connection refused xinit: server error ------------------------------------------------------------------------------------------- xinit failed. /usr/bin/Xorg is not setuid, maybe that's the reason? If so either use a display manager (strongly recommended) or adjust /etc/ permissions.local and run "chkstat --system --set" afterwards Couldn't get a file descriptor referring to the console knurpht@Knurpht-HP:~> Specially the last bits. Also read the recent articles of vulnerabilities in changing this setuid setting. -- Gertjan Lettink a.k.a. Knurpht openSUSE Board Member openSUSE Forums Team -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

01:14

On 06/11/2018 02.04, Knurpht-openSUSE wrote:

...

Op dinsdag 6 november 2018 01:29:02 CET schreef Carlos E. R.:

...

xinit failed. /usr/bin/Xorg is not setuid, maybe that's the reason? If so either use a display manager (strongly recommended) or adjust /etc/ permissions.local and run "chkstat --system --set" afterwards Couldn't get a file descriptor referring to the console knurpht@Knurpht-HP:~>

Specially the last bits. Also read the recent articles of vulnerabilities in changing this setuid setting.

I believe he already did that ages ago :-) This is a different problem. See the first post. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Liam Proven

10:51

On 05/11/2018 19:38, Carlos E. R. wrote:

...

...
True, but I can't tell any difference.

Depends on the application.

True, I'm sure.

...

Yes, it will wear up faster, but the alternatives are slow and get on my nerves... this way I delay purchasing a new computer.

:-) That sort of makes sense. Although my policy for avoiding buying new computers is to only buy old computers. It saves me a lot of money... ;-)

...

Mind: since some kernel version swap became very slow. I think it happened when going from 13.1 to leap 42.2. I suspect that swap is "fragmented" (i/o was just few megabytes, while the disk is capable of 150MB/S).

I don't think swap _can_ get fragmented... but it's fairly easy to remove it and re-create it, even on a running system.

...

I considered it.

I know it sounds crazy, but I think the proof that the idea works is that it was introduced as standard in Mac OS X as of version 10.9 "Mavericks". https://www.lifewire.com/understanding-compressed-memory-os-x-2260327 The OS X implementation is slightly superior, inasmuch as it both compresses into a swapfile in RAM, then writes the compressed image into disk swap. In Linux terms it's a combination of ZRAM + ZSwap + ZCache. The tech is also in Win10: https://www.makeuseof.com/tag/ram-compression-improves-memory-responsiveness... And in ChromeOS, Android and IBM AIX. So it's pretty mainstream stuff now. Multicore processors help a lot -- a lot of code is still single-threaded, and with ZRAM, idle cores can be used to do the compression, while other cores are busy.

...

I use clamav because I like finding out if I am sent some virus garbage, mostly intended for Windows, of course.

I appreciate that but it's a high price to pay on a memory-constrained system, IMHO. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Michael Fischer

12:54

On Tue, Nov 06, Carlos E. R. wrote:

...

On 06/11/2018 02.04, Knurpht-openSUSE wrote:

...
Op dinsdag 6 november 2018 01:29:02 CET schreef Carlos E. R.:

...

...
xinit failed. /usr/bin/Xorg is not setuid, maybe that's the reason? If so either use a display manager (strongly recommended) or adjust /etc/ permissions.local and run "chkstat --system --set" afterwards Couldn't get a file descriptor referring to the console knurpht@Knurpht-HP:~>

Specially the last bits. Also read the recent articles of vulnerabilities in changing this setuid setting.

I believe he already did that ages ago :-)

This is a different problem. See the first post.

Correct. I'm using multi-user.target and `startx`. So the problem is with the original login prompt after boot. Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Michael Fischer

13:12

On Tue, Nov 06, Carlos E. R. wrote:

...

On 05/11/2018 22.13, Michael Fischer wrote:

...
On Mon, Nov 05, Carlos E. R. wrote:

...

And you have problems login in there, for a while, then it works? Very strange.

And you have already replaced the /home disk? The old disk is disconnected? I mean, the data cable removed?

Nope. At the point where I've got all the equipment I've ordered and install that new SSD for /home, I'm going to install Leap 15. Currently all the same stuff, still connected.

...

Play with "systemd-analyze". Tab tab for options. Start with "blame". Then "critical-chain". You may see what service, if any, is taking long to start.

In Critical chain the '+' indicates how long a service takes to complete start; and when this value does appear, it is that it takes long. Those lines may show in red.

The multiuser target is started early, but many other services are started later and take long to complete.

I append `blame` - though I don't see anything striking. I'm used to wicked taking a while (relatively speaking), but login seems to take very little time. The fsck's are on the two partitions on the rust disk. Is there some dependency between login(1) and /home being "available"? 5.323s wicked.service 1.096s spamd.service 321ms vboxdrv.service 245ms systemd-journal-flush.service 243ms dev-nvme0n1p2.device 242ms systemd-fsck@dev-disk-by\x2duuid-dcb3864a\x2dc1b0\x2d47bf\x2dbf2a\x2dc0ce4e163a03.service 185ms home.mount 165ms systemd-fsck@dev-disk-by\x2duuid-7d15344a\x2dca83\x2d4d3d\x2d8e72\x2d54dc4d2bc2cc.service 137ms postfix.service 109ms systemd-tmpfiles-setup.service 65ms systemd-udevd.service 60ms systemd-udev-trigger.service 52ms boot-efi.mount 48ms upower.service 44ms user@1000.service 42ms systemd-journald.service 40ms plymouth-start.service 39ms systemd-remount-fs.service 35ms nscd.service 32ms tmp.mount 30ms sys-kernel-debug.mount 30ms dev-mqueue.mount 28ms logrotate.service 27ms polkit.service 25ms systemd-udev-root-symlink.service 24ms plymouth-quit-wait.service 23ms plymouth-quit.service 18ms wickedd-dhcp4.service 18ms dev-hugepages.mount 17ms ntpd.service 16ms avahi-daemon.service 15ms jexec.service 15ms alsa-restore.service 14ms sshd.service 13ms plymouth-read-write.service 13ms systemd-vconsole-setup.service 12ms systemd-modules-load.service 12ms kmod-static-nodes.service 12ms systemd-tmpfiles-setup-dev.service 11ms systemd-sysctl.service 10ms bluetooth.service 10ms systemd-tmpfiles-clean.service 10ms auditd.service 10ms systemd-fsck-root.service 8ms wickedd-auto4.service 8ms sys-fs-fuse-connections.mount 7ms wickedd-dhcp6.service 7ms systemd-random-seed.service 6ms iscsi.service 5ms rtkit-daemon.service 5ms systemd-backlight@backlight:acpi_video0.service 5ms dev-disk-by\x2duuid-2e527c93\x2d53d9\x2d42eb\x2db8b6\x2d983404b1204e.swap 5ms systemd-logind.service 4ms proc-sys-fs-binfmt_misc.mount 4ms wickedd.service 4ms systemd-update-utmp.service 3ms after-local.service 3ms rc-local.service 3ms mcelog.service 2ms vboxautostart-service.service 2ms vboxweb-service.service 2ms vboxballoonctrl-service.service 2ms dracut-shutdown.service 2ms wickedd-nanny.service 1ms systemd-user-sessions.service 1ms systemd-update-utmp-runlevel.service 1ms systemd-rfkill.service Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

18:03

On 06/11/2018 14.12, Michael Fischer wrote:

...

On Tue, Nov 06, Carlos E. R. wrote:

...
On 05/11/2018 22.13, Michael Fischer wrote:

...
On Mon, Nov 05, Carlos E. R. wrote:

...
And you have problems login in there, for a while, then it works? Very strange.

And you have already replaced the /home disk? The old disk is disconnected? I mean, the data cable removed?

Nope. At the point where I've got all the equipment I've ordered and install that new SSD for /home, I'm going to install Leap 15. Currently all the same stuff, still connected.

...
Play with "systemd-analyze". Tab tab for options. Start with "blame". Then "critical-chain". You may see what service, if any, is taking long to start.

In Critical chain the '+' indicates how long a service takes to complete start; and when this value does appear, it is that it takes long. Those lines may show in red.

The multiuser target is started early, but many other services are started later and take long to complete.

I append `blame` - though I don't see anything striking. I'm used to wicked taking a while (relatively speaking), but login seems to take very little time.

Try with critical chain when you can, please. blame only shows if a service is too slow, but not if it has impact. cer@Telcontar:~> systemd-analyze critical-chain The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character. cer@Telcontar:~> systemd-analyze blame 3min 41.381s purge-kernels.service 37.341s systemd-cryptsetup@cr_cripta.service 3min 41.381s purge-kernels.service 37.341s systemd-cryptsetup@cr_cripta.service 16.353s clamd.service 7.834s spamd.service 6.180s rpmconfigcheck.service 5.796s fstrim.service 5.034s wicked.service 4.374s SuSEfirewall2.service 1.988s home_aux.mount 1.985s home1.mount 1.609s data-vmware.mount 1.357s ntpd.service multi-user.target @1min 4.525s └─NetworkManager-dispatcher.service @1min 19.687s +18ms └─basic.target @40.576s └─sockets.target @40.565s └─avahi-daemon.socket @40.554s └─sysinit.target @40.457s └─sys-fs-fuse-connections.mount @1min 35.814s +13ms └─systemd-modules-load.service @382ms +1.221s └─systemd-journald.socket └─-.mount └─system.slice └─-.slice (and the system is on ssd...)

...

The fsck's are on the two partitions on the rust disk. Is there some dependency between login(1) and /home being "available"?

Maybe... Dunno. login of a user should fail, but for a reason that says that the disk is not ready.

...

5.323s wicked.service 1.096s spamd.service 321ms vboxdrv.service 245ms systemd-journal-flush.service

-- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Michael Fischer

19:14

On Tue, Nov 06, Carlos E. R. wrote:

...

On 06/11/2018 14.12, Michael Fischer wrote:

...

Try with critical chain when you can, please.

Below. The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character. multi-user.target @6.446s └─cron.service @6.446s └─postfix.service @6.308s +137ms └─time-sync.target @6.307s └─ntpd.service @6.290s +17ms └─network.target @6.287s └─wicked.service @963ms +5.323s └─wickedd-nanny.service @960ms +2ms └─wickedd.service @955ms +4ms └─wickedd-dhcp4.service @934ms +18ms └─dbus.service @907ms └─basic.target @902ms └─sockets.target @902ms └─dbus.socket @902ms └─sysinit.target @902ms └─systemd-update-utmp.service @897ms +4ms └─systemd-tmpfiles-setup.service @786ms +109ms └─local-fs.target @784ms └─home.mount @598ms +185ms └─systemd-fsck@dev-disk-by\x2duuid-7d15344a\x2dca83\x2d4d3d\x2d8e72\x2d54dc4d2bc2cc.service @432ms +165ms └─dev-disk-by\x2duuid-7d15344a\x2dca83\x2d4d3d\x2d8e72\x2d54dc4d2bc2cc.device @431ms Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

7 Nov 7 Nov

09:43

On 06/11/2018 20.14, Michael Fischer wrote:

...

On Tue, Nov 06, Carlos E. R. wrote:

...
On 06/11/2018 14.12, Michael Fischer wrote:

...
Try with critical chain when you can, please.

Below.

The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character.

multi-user.target @6.446s └─cron.service @6.446s └─postfix.service @6.308s +137ms └─time-sync.target @6.307s └─ntpd.service @6.290s +17ms └─network.target @6.287s └─wicked.service @963ms +5.323s └─wickedd-nanny.service @960ms +2ms └─wickedd.service @955ms +4ms └─wickedd-dhcp4.service @934ms +18ms └─dbus.service @907ms └─basic.target @902ms └─sockets.target @902ms └─dbus.socket @902ms └─sysinit.target @902ms └─systemd-update-utmp.service @897ms +4ms └─systemd-tmpfiles-setup.service @786ms +109ms └─local-fs.target @784ms └─home.mount @598ms +185ms └─systemd-fsck@dev-disk-by\x2duuid-7d15344a\x2dca83\x2d4d3d\x2d8e72\x2d54dc4d2bc2cc.service @432ms +165ms └─dev-disk-by\x2duuid-7d15344a\x2dca83\x2d4d3d\x2d8e72\x2d54dc4d2bc2cc.device @431ms

Well, I do not see the problem. I see something I do not understand, though: multi-user.target starts at 6 seconds (@6.446s). A bit later, wicked.service starts at 0.9 seconds (@963ms) and takes 5 seconds to complete (+5.323s). How can it start at 1 second after boot, thus before than "multi-user.target", and be later in the path? I must be reading it incorrectly. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Michael Fischer

13:58

On Wed, Nov 07, Carlos E. R. wrote:

...

On 06/11/2018 20.14, Michael Fischer wrote:

...
On Tue, Nov 06, Carlos E. R. wrote:

...
On 06/11/2018 14.12, Michael Fischer wrote:

...
Try with critical chain when you can, please.

Below.

The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character.

multi-user.target @6.446s └─cron.service @6.446s └─postfix.service @6.308s +137ms └─time-sync.target @6.307s └─ntpd.service @6.290s +17ms └─network.target @6.287s └─wicked.service @963ms +5.323s └─wickedd-nanny.service @960ms +2ms └─wickedd.service @955ms +4ms └─wickedd-dhcp4.service @934ms +18ms └─dbus.service @907ms └─basic.target @902ms └─sockets.target @902ms └─dbus.socket @902ms └─sysinit.target @902ms └─systemd-update-utmp.service @897ms +4ms └─systemd-tmpfiles-setup.service @786ms +109ms └─local-fs.target @784ms └─home.mount @598ms +185ms └─systemd-fsck@dev-disk-by\x2duuid-7d15344a\x2dca83\x2d4d3d\x2d8e72\x2d54dc4d2bc2cc.service @432ms +165ms └─dev-disk-by\x2duuid-7d15344a\x2dca83\x2d4d3d\x2d8e72\x2d54dc4d2bc2cc.device @431ms

Well, I do not see the problem.

I see something I do not understand, though:

multi-user.target starts at 6 seconds (@6.446s). A bit later, wicked.service starts at 0.9 seconds (@963ms) and takes 5 seconds to complete (+5.323s).

How can it start at 1 second after boot, thus before than "multi-user.target", and be later in the path?

I must be reading it incorrectly.

I think one must read it from bottom UP. Thus, "network" is considered complete AFTER wicked is done. Michael -- Michael Fischer michael@visv.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

9 Nov 9 Nov

10:20

On 06/11/2018 11.51, Liam Proven wrote:

...

On 05/11/2018 19:38, Carlos E. R. wrote:

...
...
True, but I can't tell any difference.

Depends on the application.

True, I'm sure.

...
Yes, it will wear up faster, but the alternatives are slow and get on my nerves... this way I delay purchasing a new computer.

:-) That sort of makes sense.

Although my policy for avoiding buying new computers is to only buy old computers. It saves me a lot of money... ;-)

Absolutely :-) I buy new but not the most recent design. This motherboard I bought with a Quad Core 2 processor, what was a large amount of ram at the time (the max the board allowed), 8 GB of DDR3. The idea was not top speed, but data movement. MSI P45 Diamond, MS-7516. Several hard disk sata ports. December 2009 and still working fine... All my computers I had, had to be replaced because not enough memory.

...

...
Mind: since some kernel version swap became very slow. I think it happened when going from 13.1 to leap 42.2. I suspect that swap is "fragmented" (i/o was just few megabytes, while the disk is capable of 150MB/S).

I don't think swap _can_ get fragmented... but it's fairly easy to remove it and re-create it, even on a running system.

Not that way. It is the memory itself which is fragmented. If a process is requesting a number of chunks to store things, all the chunks it gets might not be contiguous. Consider firefox. Even if it stores each tab in a single chunk of memory, the next tab will not be contiguous, it may be some other process. Now, surely each tab needs many chunks, and I use many tabs. It may be chunk 1 from tab 15, then chunk 20 of tab 16, etc. When one tab awakes, it has to restore chunks of memory that are not contiguous, and also not stored contiguously on swap, which will use some type of memory map. Not fragmented in the same sense as a filesystem, but in the sense that to recover for example a tab from swap it will have to recover a large number of chunks that in all probability will not be contiguous. I could hear the disk head moving like mad when switching apps or tabs, the app waiting for I/O (I can see that in an XFCE applet), and that partition I/O going around 1 MB/S, thus the disk was not performing optimally... My hypothesis I call swap fragmentation, or virtual memory fragmentation. Maybe it has another official name. It is not an issue if swap is on SSD, but on rotating disk it is a lot. Same as dumping to text the systemd journal is a very slow operation, measured in many minutes on rotating rust. We "blamed" the developers having good modern hardware, ie, SSD, thus not testing on iron ;-)

...

...
I considered it.

I know it sounds crazy, but I think the proof that the idea works is that it was introduced as standard in Mac OS X as of version 10.9 "Mavericks".

Oh.

...

https://www.lifewire.com/understanding-compressed-memory-os-x-2260327

The OS X implementation is slightly superior, inasmuch as it both compresses into a swapfile in RAM, then writes the compressed image into disk swap. In Linux terms it's a combination of ZRAM + ZSwap + ZCache.

The tech is also in Win10:

https://www.makeuseof.com/tag/ram-compression-improves-memory-responsiveness...

And in ChromeOS, Android and IBM AIX.

So it's pretty mainstream stuff now. Multicore processors help a lot -- a lot of code is still single-threaded, and with ZRAM, idle cores can be used to do the compression, while other cores are busy.

I will think about it again :-)

...

...
I use clamav because I like finding out if I am sent some virus garbage, mostly intended for Windows, of course.

I appreciate that but it's a high price to pay on a memory-constrained system, IMHO.

Yes... I want my cake and eat it O:-) The trick to send it to swap kind of works :-)

...

top - 11:18:55 up 20 days, 1:31, 2 users, load average: 0,25, 0,29, 0,40 Tasks: 547 total, 1 running, 545 sleeping, 0 stopped, 1 zombie %Cpu(s): 1,4 us, 1,0 sy, 0,0 ni, 97,5 id, 0,1 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem: 8174460 total, 7695600 used, 478860 free, 1126180 buffers KiB Swap: 25165820 total, 6592928 used, 18572892 free. 2606064 cached Mem

PID USER PR NI VIRT RES SHR SWAP S %CPU %MEM TIME+ COMMAND 2437 vscan 20 0 44472 456 32 964 S 0,000 0,006 1:26.22 freshclam 4502 vscan 20 0 864260 86628 1316 476032 S 0,000 1,060 10:29.83 clamd 4513 vscan 20 0 168220 3028 1804 57344 S 0,000 0,037 0:09.78 /usr/sbin/amavi 30348 vscan 20 0 169792 27068 4920 36624 S 0,000 0,331 0:00.41 /usr/sbin/amavi 31185 vscan 20 0 169880 19976 16 38888 S 0,000 0,244 0:00.72 /usr/sbin/amavi

Ideal would be to start clamd when needed and kill it automatically with a timeout. Maybe systemd can do that, but I failed to do it. I may just move amavis+clamd to another computer. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Liam Proven

12:42

On 09/11/2018 11:20, Carlos E. R. wrote:

...

I buy new but not the most recent design. This motherboard I bought with a Quad Core 2 processor, what was a large amount of ram at the time (the max the board allowed), 8 GB of DDR3. The idea was not top speed, but data movement. MSI P45 Diamond, MS-7516. Several hard disk sata ports. December 2009 and still working fine...

I still have such a box, given to me from my local Freegle group about 6y ago. Core 2 Quad Extreme. I ran it as a Hackintosh for several years, dual-booting Win8 and Linux too. It is very noisy in use, though, and I rarely use it now.

...

All my computers I had, had to be replaced because not enough memory.

I haven't really got that far, but the problem with some is that although I _could_ fit more RAM, large amounts of the old slow types of RAM that they take would cost too much. E.g. I had an old Core 2 Duo laptop, a perfectly nice machine, but upgrading it past 4GB would cost more than it was worth. So I sold it off.

...

Not that way. It is the memory itself which is fragmented.

Ah, yes, that is an issue, now that modern OSes stay up long enough for it to become a problem. Of course, a reboot solves it!

...

...
I know it sounds crazy, but I think the proof that the idea works is that it was introduced as standard in Mac OS X as of version 10.9 "Mavericks".

Oh.

Yeah. :-) It's quite standard now. I have yet to get ZRAM working on openSUSE but it's fairly easy on Ubuntu.

...

Yes... I want my cake and eat it O:-)

The trick to send it to swap kind of works :-)

OK. I prefer an easier life, myself... ;-) -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

12:55

On 09/11/2018 13.42, Liam Proven wrote:

...

On 09/11/2018 11:20, Carlos E. R. wrote:

...
I buy new but not the most recent design. This motherboard I bought with a Quad Core 2 processor, what was a large amount of ram at the time (the max the board allowed), 8 GB of DDR3. The idea was not top speed, but data movement. MSI P45 Diamond, MS-7516. Several hard disk sata ports. December 2009 and still working fine...

I still have such a box, given to me from my local Freegle group about 6y ago. Core 2 Quad Extreme. I ran it as a Hackintosh for several years, dual-booting Win8 and Linux too.

It is very noisy in use, though, and I rarely use it now.

Ah, but that is the fans. Good silent fans cost more ;-)

...

...
All my computers I had, had to be replaced because not enough memory.

I haven't really got that far, but the problem with some is that although I _could_ fit more RAM, large amounts of the old slow types of RAM that they take would cost too much.

E.g. I had an old Core 2 Duo laptop, a perfectly nice machine, but upgrading it past 4GB would cost more than it was worth. So I sold it off.

Yes, exactly. More memory for my Pentium V with 32MiB of RAM was too costly, so purchase instead new computer. Thus it died of RAM insufficiency. ;-)

...

...
Not that way. It is the memory itself which is fragmented.

Ah, yes, that is an issue, now that modern OSes stay up long enough for it to become a problem.

Of course, a reboot solves it!

Or a restart of firefox.

...

...
...
I know it sounds crazy, but I think the proof that the idea works is that it was introduced as standard in Mac OS X as of version 10.9 "Mavericks".

Oh.

Yeah. :-) It's quite standard now.

I have yet to get ZRAM working on openSUSE but it's fairly easy on Ubuntu.

...
Yes... I want my cake and eat it O:-)

The trick to send it to swap kind of works :-)

OK. I prefer an easier life, myself... ;-)

Well, yes. Did you see that the machine has 8 GiB of RAM, and there were 6 GiB of swap in use? Yes, I need more ram, which means new board, and money. That's the best solution, then easy life... But I do not want to expend that money now. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.0 (Legolas))

Liam Proven

13:07

On 09/11/2018 13:55, Carlos E. R. wrote:

...

Ah, but that is the fans. Good silent fans cost more ;-)

It came with a fancy Zalman CPU cooler, and it's so old, I don't really want to spend money on replacing it. I've tried replacing all the other fans, including a silent passively-cooled GPU, but nothing made any noticeable difference. :-(

...

Yes, exactly. More memory for my Pentium V with 32MiB of RAM was too costly, so purchase instead new computer.

Pentium *V*? I thought they never shipped. But the P4 series were space-heaters anyway. I would not keep such a machine around.

...

Thus it died of RAM insufficiency. ;-)

Well, that is one way of looking at it, I suppose.

...

Or a restart of firefox.

If that is the offending app, yes. But if other things stay in RAM, the RAM will remain fragmented.

...

Well, yes. Did you see that the machine has 8 GiB of RAM, and there were 6 GiB of swap in use? Yes, I need more ram, which means new board, and money. That's the best solution, then easy life... But I do not want to expend that money now.

:-o I did not. I would not use such a setup. I'd set up a 2nd machine as an email server or something first... -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

10 Nov 10 Nov

11:10

On 09/11/2018 14.07, Liam Proven wrote:

...

On 09/11/2018 13:55, Carlos E. R. wrote:

...
Ah, but that is the fans. Good silent fans cost more ;-)

It came with a fancy Zalman CPU cooler, and it's so old, I don't really want to spend money on replacing it.

I've tried replacing all the other fans, including a silent passively-cooled GPU, but nothing made any noticeable difference. :-(

...
Yes, exactly. More memory for my Pentium V with 32MiB of RAM was too costly, so purchase instead new computer.

Pentium *V*? I thought they never shipped.

But the P4 series were space-heaters anyway. I would not keep such a machine around.

I have not powered it up in years, so I might be wrong.

...

...
Thus it died of RAM insufficiency. ;-)

Well, that is one way of looking at it, I suppose.

LOL :-) I mean, it could have lasted a year or two more if I had more ram for it at the time :-)

...

...
Or a restart of firefox.

If that is the offending app, yes. But if other things stay in RAM, the RAM will remain fragmented.

...
Well, yes. Did you see that the machine has 8 GiB of RAM, and there were 6 GiB of swap in use? Yes, I need more ram, which means new board, and money. That's the best solution, then easy life... But I do not want to expend that money now.

:-o

I did not.

I would not use such a setup. I'd set up a 2nd machine as an email server or something first...

Yes, I'm considering that. But that would at most save between 0.5 and 1 GB, out of 6 in swap... Wow. I just looked, and

...

PID USER PR NI VIRT RES SHR SWAP S %CPU %MEM TIME+ COMMAND 17065 cer 20 0 4452780 1,746g 19440 403432 S 0,000 22,40 259:18.36 Web Content

Probably one firefox tab is using 1.7 Gigs. I'll kill it and see who complains. https://status.opensuse.org/ A google search on Kashogui https://en.opensuse.org/Firewalld#Documentation A page from my city tax centre. A page from my region government land maps (not displaying a map at the moment). A google search on Lazarus locale A mozilla bugzillla page. Nothing that should be that huge. Then Thunderbird is using 1.2 G. Firefox main 0.8 I'm now down to about 4G in swap. I have the impression that the same workload that I have used for years is growing in RAM demands, perhaps double, since two or three years. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

Liam Proven

12 Nov 12 Nov

10:56

On 10/11/2018 12:10, Carlos E. R. wrote:

...

I have the impression that the same workload that I have used for years is growing in RAM demands, perhaps double, since two or three years.

Browsers have growing RAM demands, yes. Chrome pioneered a new model. Each tab is a separate _process_ (not a thread). The browser "frame" (window title & scroll bars, etc.) is another process, which spawns lots of children. This means that the OS's own memory management can manage the RAM and processor usage of tabs. Long-unused tabs will be paged out to disk, for instance. It uses the power of modern PCs. Tabs can run on different CPUs, the browser as a whole can use lots of RAM *in separate chunks* rather than one monolithic block, etc. But it does mean it's very RAM-hungry. Processes use more resources than threads. Firefox has not copied this. Its "Electrolysis" project span off 1 background process for tab contents rendering, as an experiment. Then it went truly multiprocess, with up to 3 background & 1 foreground process. But that's all. The idea is that it will help people with up to quad-core PCs, and few people have more than 4 cores anyway. If you do, you're probably a high-powered user, doing CAD or modelling or something, not just browsing the web, so your browser should not eat all your resources. So, yes, Firefox post version 50 or so takes more RAM, and Firefox Quantum _much_ more, but it should take less than Chrome. FWIW, I experimented with ZRAM and swapspace a few years back, and wrote some blog posts about it: https://liam-on-linux.livejournal.com/34124.html https://liam-on-linux.livejournal.com/34500.html What I found was that, on Ubuntu, ZRAM will grow until it has filled (50% of your RAM) with compressed data, then it will spill over into disk-based swap. In other words, yes, you can use both together, profitably. If I was trying to do real work on a 4GB machine, I would eliminate any non-essential tasks (e.g. scanning for Windows malware on a Linux box) or offload them onto other machines. Then I'd use the combination of ZRAM & real swap, and probably use Chrome more, paradoxical as it might sound, so that background tabs could get paged out. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

1992

Age (days ago)

2005

Last active (days ago)

List overview

Download

58 comments

10 participants

participants (10)

Andrei Borzenkov
Anton Aylward
Carlos E. R.
David Haller
Felix Miata
James Knott
Knurpht-openSUSE
Liam Proven
Michael Fischer
Patrick Shanahan