[opensuse-factory] amd ryzen low load hard freeze kernel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, since nearly 6 month now i have trouble with my unstable ryzen systems. randmom crashes after a couple of hours (<8), (nearly) no load, most times only hard reset is possible to start system new. most times nothing special in system-log (as i know, maybe somebody who know more could see suspected things.) after searching the net: because i am not really familiar with kernel-boot parameter settings, and i have read its necessary to build own kernel (on lists of other distributions) i like to ask: how to fix (work around) or if maybe included in the tumbleweed-kernel the "low load freeze of ryzen" here: ryzen (1700) 8 core processor systems with tumbleweed. i read several links found by google: https://bugzilla.kernel.org/show_bug.cgi?id=196683 https://bugs.launchpad.net/linux/+bug/1690085/comments/69 https://forums.fedoraforum.org/showthread.php?315887-Random-Crashing they suggest to include rcu_nocbs=0-15 is /etc/defaults/grub file correct? GRUB_CMDLINE_LINUX_DEFAULT="video=1920x1200 splash=silent quiet showopts" change to: GRUB_CMDLINE_LINUX_DEFAULT="video=1920x1200 splash=silent quiet showopts rcu_nocbs=0-15 " ?? and run: grub2-mkconfig? is this correct will this work? or is there a better/other solution for tumbleweed? (at the moment here not the newest version of tumbleweed): how to check the tumbleweed version? uname -a Linux becherer1 4.14.2-1-default #1 SMP PREEMPT Fri Nov 24 08:20:07 UTC 2017 (b0610fc) x86_64 x86_64 x86_64 GNU/Linux ============ by the way, as suggested by this mailing-list i have replaced the amd ryzen prozessors because of the segfault problem (i found this problem for my systemsy after the hint on this list). tumbleweed is now running fine with HIGH load (kill-ryzen.sh) script (more than 48 hours) and for others who read inside internet "amd ryzen production date "after week 25 (sometimes is written week30) is fine" THIS is NOT TRUE. at least this type: ryzen 1700 UA 1733PGS has this bug, (2017 week 33) (i have had after a first replacement 4 of this processors all have had this bug) see also here: https://forum.level1techs.com/t/ryzen-pre-week-25-fabrication-rma-issue/1186... ============================== would be happy to get an info how to fix the low load problem to get stable tumbleweed-ryzen-system's. thanks in advance, simoN -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIbBAEBAgAGBQJacJjHAAoJEOuDxDCJWQG+9Y8P+K6tnQsCVoKSI4rrrSWqHGJf aZEcAqmgP1R+DvR6/79iDEx3P5Gq0CWSdGWTY8y385PA5AeurT+C/0n2yE84ojjM SzrzK3b5I+qMPCaHJNrrHpPa6WZW9DCSf1sE88XXDDKBscIVvCsbP2g//IYNMT1F 6ncZRzOmGXGPaeuTvAeQ6tfNKUL1rBA7QKjHwILpUE5hkFY8WW++fhiej9sklXxM QJzpfV0KWJEUtu8aEkV/TShL6N/kjJhwDDPiLO5vV/LkeGIDSLJwqOBkMn0+TYBk CJb72SCQEAvM+dL8Zjnz9Z5IwOKxIZvq0rKxay9UnwtH/KUDMnsznHo9yxcjbp6S czSnr5Aov8BKxnOI6nKFr94oZSk/x0djZZLtsp00LgHnKLkZZTzjQTurQkWy5e6W 4ZUNtSwgpXINqPz87eIgcEvvRNV16HhdstmBf12OSPls9Z1RDDrQR6plEPWU823Q pZUEyEBDNSqArNLNUpW7VPBh0u3BdqWdyQCbuyb2StkVKXi9xtaVZJrzEKYQMELt JLXL5Efz6SlYsSYWsrPWGrwBXWnJ4TSUHmnJTWJQ9W8krap4E9c9X0hBG4uwUL6r 0IJXwNBodJCwS7mAZ+Ri13i7fv5RrIkODklTbyx402GFjFDgIeLF8cKu+5CEYasL RMdkHKjfhQbSTTMGt/c= =aB7T -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, 30 Jan 2018 17:09:44 +0100 Simon Becherer <simon@becherer.de> wrote:
since nearly 6 month now i have trouble with my unstable ryzen systems.
randmom crashes after a couple of hours (<8), (nearly) no load, most times only hard reset is possible to start system new. most times nothing special in system-log (as i know, maybe somebody who know more could see suspected things.)
A few questions. * Have you checked if your system firmware is current? If not, update it. * Do you have any other OSes on the machine? * Are you able to verify if this is a SUSE issue? E.g. by trying some other distro and seeing if it exhibits the same behaviour? Since it does it under _low_ load, then merely running a live CD might even be enough -- or running memtextx86+ overnight. * Do you have a Windows partition? Does Windows do the same thing? It is possible to download an ISO of Windows 10 from Microsoft free of charge, and it will run unactivated. There is also a 90-day fully-functional evaluation version. Either would do for testing. Either might also assist with updating motherboard firmware. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi liam,
* Have you checked if your system firmware is current? If not, update it. oops, at 23.01.2018 they provide a update. sorry, i will install and check if anything is solved.
* Do you have any other OSes on the machine? no
* Are you able to verify if this is a SUSE issue? i not, but google says ;-)) a lot distributions are affected (see the links i provided) they have this issue with no suse linux systems.
- -> my question was, if the suse-tumbleweed kernel is compiled to use a solution provided there.
* Do you have a Windows partition? Does Windows do the same thing? no, no windows partition, google says, windows will never has a low load as linux, and therefore it will be not (is not) affected.
first i will test the BIOS update, sorry my fault, i check in beginning of january. i will write here if this fixes the issue. simoN Am 30.01.2018 um 18:52 schrieb Liam Proven:
On Tue, 30 Jan 2018 17:09:44 +0100 Simon Becherer <simon@becherer.de> wrote:
since nearly 6 month now i have trouble with my unstable ryzen systems.
randmom crashes after a couple of hours (<8), (nearly) no load, most times only hard reset is possible to start system new. most times nothing special in system-log (as i know, maybe somebody who know more could see suspected things.)
A few questions.
* Have you checked if your system firmware is current? If not, update it.
* Do you have any other OSes on the machine?
* Are you able to verify if this is a SUSE issue?
E.g. by trying some other distro and seeing if it exhibits the same behaviour?
Since it does it under _low_ load, then merely running a live CD might even be enough -- or running memtextx86+ overnight.
* Do you have a Windows partition? Does Windows do the same thing?
It is possible to download an ISO of Windows 10 from Microsoft free of charge, and it will run unactivated. There is also a 90-day fully-functional evaluation version.
Either would do for testing.
Either might also assist with updating motherboard firmware.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAEBAgAGBQJacMC7AAoJEOuDxDCJWQG+uloP/iUEoV1H072HWNx9TYhpIIeb Om4qCKJZTsO/lK8V884jqTESRT4m8GbJBlQdoExO28zIWw4L/nRWKt1c8g+ERdVv K64J+Ai7w7WDbji+GnjKSGW3fexZDIaCZTJBWgkFWvsa2xfVFYTskl56vady2y3B 8l9PaKvzzF6msPt3wj68Pufwb9taDRqavISEhS6wN/hh9dlc69hHYtyJYBAv6ke0 8WBjSpV49FcnuEOz2D9wraKd/hBGLHUsnRzbUphWjpXMRtGKXVoQeyudvaz3arsM bTia5T/lONNttTHrp6lw9D0y0KTTWfcO6Bx8o78A9JgnXnqkaQsw9D5Wsoe7dQ9a nRwLwElvuLK3PoRp0t0GLWXbU5Zn5iwfV6ciDV6VaZr7iCuHS+AXu4QgBIdhY6Pz FiEvcbApswTuloKZEbuaxbVxwy/mTBRjwXAg3PbK+k6ZOPp+b5OBbVAsY0tlskOj C8POns/L9regTz+eTtwwsXcBniMrAKgxszcNwm6weJG49TMgkLzFv0eGnFEHj7PD dvdMiJUmDgUsH6pan0vHH/eNYkaR9+Ddlt9Ux2i3IOo6dxSO0nT+oeGZ5/ibs53h hP/2xYRWMq3yl+0oW/3Y652/4ZEbyyUayjirxriPs1IZzFQhcpmD9QHsZMFnEYLF frT5Yp2g2oeTrCifeh6y =C9Jb -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Tue, 30 Jan 2018 20:00:11 +0100 Simon Becherer <simon@becherer.de> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi liam,
* Have you checked if your system firmware is current? If not, update it. oops, at 23.01.2018 they provide a update. sorry, i will install and check if anything is solved.
I hope it helps!
* Are you able to verify if this is a SUSE issue? i not, but google says ;-)) a lot distributions are affected (see the links i provided) they have this issue with no suse linux systems.
I think I follow you. In other words, it's a Linux-only issue, but cross-distro?
* Do you have a Windows partition? Does Windows do the same thing? no, no windows partition, google says, windows will never has a low load as linux, and therefore it will be not (is not) affected.
*LOL* Well, I can believe that. ;-)
first i will test the BIOS update, sorry my fault, i check in beginning of january.
i will write here if this fixes the issue.
That's OK. Good luck. -- Liam Proven - Technical Writer, SUSE Linux s.r.o. Corso II, Křižíkova 148/34, 186-00 Praha 8 - Karlín, Czechia Email: lproven@suse.com - Office telephone: +420 284 241 084 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi again,
i will write here if this fixes the issue.
That's OK. Good luck.
to inform you and still asking: amdryzen 1700 (rma replacement) asrock ab350m: bios update to version 4.40 makes it "MORE" stable at low load, but still random freezes at low load. AND: after bios update new problem regarding high load: (during running of modified "killryzen script" wich will write 4 coeres of the compier result to nvme-hard-drive: kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: id=0000 kernel: pcieport 0000:00:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0019(Transmitter ID) changing GRUB_CMDLINE adding "pcie_aspm=off" solves this. (reporting this to asrock has gave me no reply) =============== so still my question is open: how is the opensuse tumbleweed kernel compiled? fedora says: https://bugs.launchpad.net/linux/+bug/1690085/comments/69 it has to be compiled with CONFIG_RCU_NOCB_CPU if it is, it will accept (and use) the grub_cmdline parameter: rcu_nocbs=0-15 could anybody tell me if our kernel is compiled like this and will accept this parameter? thanks, simoN Am 31.01.2018 um 13:38 schrieb Liam Proven:
On Tue, 30 Jan 2018 20:00:11 +0100 Simon Becherer <simon@becherer.de> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi liam,
* Have you checked if your system firmware is current? If not, update it. oops, at 23.01.2018 they provide a update. sorry, i will install and check if anything is solved.
I hope it helps!
* Are you able to verify if this is a SUSE issue? i not, but google says ;-)) a lot distributions are affected (see the links i provided) they have this issue with no suse linux systems.
I think I follow you. In other words, it's a Linux-only issue, but cross-distro?
* Do you have a Windows partition? Does Windows do the same thing? no, no windows partition, google says, windows will never has a low load as linux, and therefore it will be not (is not) affected.
*LOL* Well, I can believe that. ;-)
first i will test the BIOS update, sorry my fault, i check in beginning of january.
i will write here if this fixes the issue.
That's OK. Good luck.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAEBAgAGBQJalYWoAAoJEOuDxDCJWQG+f4kP/2sA6tqT3V904LBnT3YX77Df yj/TF+Xy8AeuhdRDMqHNsStqjjxqGy2pG4wsN6p5uZ3vxb+HUo5rGYzvSffHNDK+ 7zcpGBh6s/TZxxIEyLzlm5+JrUUkZ5yCcttFwIZVh3Z0S9gh+sapMmZwYYxVHgBL A8JdTlBEtSmNuquwZn4ujUMRPmbicaxnOgyOS3Sw3ZiFygpLTyGEFCLwQ6Px13y3 1Pp//indHAgrE1z8F/msctM/vbB9TOxcWL8Oc3FxaKGbx0Oon9JEAwjJzjc6HtAt ZtHX+dTENBdPo+daLB/bdKFyoduH90JyYwkKoueWowXCkpAF4TWHOLlEyM/UBfJD UoEHxentVIvVOyXmfcIDXs5UJ/Gd3Fuyor1pxCD+SpHbjfdO0E6m2TyOQqvQvGl5 /z5jCeNZiw5wEkAzW3wP1G2ok/ZqZaHl1W8sDflNnE/4dF21yJ74+tXwl2wGcl24 hUf3TCYXTEo7AUso4P3M7SGRpSVaK5/99rJpyWaMSgQ0TBvpXHpqdXJd3RX71HP0 Fo8i/YvGVuJvWptD2WFywtXHMRyOth/yr6KGOUg2dLKl2cGkaURa9dyh61nGKPJV u9b78THkALNCRo+x6WL3Z8Xupionk6Z46JOLOe47FQ0UAtGpe78FgxBneyWQ653D Fu3Xy0KLpFbseYnDrRdf =PCnV -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hi, On Tue, Feb 27, 2018 at 6:22 PM, Simon Becherer <simon@becherer.de> wrote:
how is the opensuse tumbleweed kernel compiled? fedora says: https://bugs.launchpad.net/linux/+bug/1690085/comments/69
it has to be compiled with CONFIG_RCU_NOCB_CPU
$ zcat /proc/config.gz | grep CONFIG_RCU_NOCB_CPU So apparently this option is not set. Robert -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Il 27/02/2018 17:24, Robert Munteanu ha scritto:
Hi,
On Tue, Feb 27, 2018 at 6:22 PM, Simon Becherer <simon@becherer.de> wrote:
how is the opensuse tumbleweed kernel compiled? fedora says: https://bugs.launchpad.net/linux/+bug/1690085/comments/69
it has to be compiled with CONFIG_RCU_NOCB_CPU
$ zcat /proc/config.gz | grep CONFIG_RCU_NOCB_CPU
So apparently this option is not set.
Robert
Uhmm.. zcat /proc/config.gz | grep -i CONFIG_RCU # CONFIG_RCU_EXPERT is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y CONFIG_RCU_PERF_TEST=m # CONFIG_RCU_TORTURE_TEST is not set CONFIG_RCU_CPU_STALL_TIMEOUT=60 # CONFIG_RCU_TRACE is not set # CONFIG_RCU_EQS_DEBUG is not set Does not exist at all.. maybe another option needs to be set to have CONFIG_RCU_NOCB_CPU available.. I don't know. Daniele. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, thanks for showing me how to check this out. so i will continue to try other posibility's (there is a beta bios update, and maybe c6 state switch off with a script, and there is a new bios option who (according to google) will address "maybe" this problem too....) i will report if i find a solution. simoN Am 27.02.2018 um 17:50 schrieb Daniele:
Il 27/02/2018 17:24, Robert Munteanu ha scritto:
Hi,
On Tue, Feb 27, 2018 at 6:22 PM, Simon Becherer <simon@becherer.de> wrote:
how is the opensuse tumbleweed kernel compiled? fedora says: https://bugs.launchpad.net/linux/+bug/1690085/comments/69
it has to be compiled with CONFIG_RCU_NOCB_CPU
$ zcat /proc/config.gz | grep CONFIG_RCU_NOCB_CPU
So apparently this option is not set.
Robert
Uhmm.. zcat /proc/config.gz | grep -i CONFIG_RCU
# CONFIG_RCU_EXPERT is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y CONFIG_RCU_PERF_TEST=m # CONFIG_RCU_TORTURE_TEST is not set CONFIG_RCU_CPU_STALL_TIMEOUT=60 # CONFIG_RCU_TRACE is not set # CONFIG_RCU_EQS_DEBUG is not set
Does not exist at all.. maybe another option needs to be set to have CONFIG_RCU_NOCB_CPU available..
I don't know.
Daniele.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAEBAgAGBQJalZfUAAoJEOuDxDCJWQG+/6QP/RNJYvrSuyTzTHLpTtR/baGy LtNQN5175jv/eCYrDDXRf1DSxHwi7ZJ5rm1iqqbsqeD7cgXPVxqWyNZueBMHFjCo DoBjpz28rHJDqoWwzy2fWKIsNRaFc/UtgwehXsTBxDyESLd0mTVe0Ecoj4FczuJO 5sW+hz6HA/VagDTNkfA+COVUut5IqlMCTaX/wSACJPpDIzPm0+9+J3tU5hxMjgTd 2aj1h/we4dZiFnaU5Tybow8Ve8txtARFIpZyrXClJKSZ2upQo+4KN4peXkfISb56 IF3xMn+6d6uetPm1iSqSyaa2v9F0ipNB2YqNVhNfcloh15C/EtTWs1YEXfCgLt/d 8oQaMBBJ6AzmJXu7JLJe9XH1aamr2I/I1jKurE4HWgpYQIPQw8y3l0Dl49sp7Gqh w8GhFeaXVoD1K7KBOHYBi/s9373Ya9OXBHnhTpo2lCOPSnWm+txE1Bl5HB2Jrto7 1c7gsVJWZt+h6QByW7/3Jvd9/AwB047vz3CO2x9KoeSSiS2ZRhZ4dw3hIFfAHlOe +4M99A63cpnERqn+sEBpXIC7zQsTFBUxQx/DbRyneoAgZNV0JSag8+0bHQm4bF9B ZzIcKoGzLfE+l63hNZFbJaUXKJfozFTWwfp19MGgd8/9T9WOzxUcpP/FFH/kN8rZ R6k3bwWLEGrpLqmzO15K =GjDp -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi again, for people who have also stability problems: i think my systems now run stable: a) as widely found by google, ryzen linux systems have some problems: 1) high load bug, -> only solution change processor 2) low load bug, -> mainboard + bios and software related. b) low load bug see: https://bugzilla.kernel.org/show_bug.cgi?id=196683 in short words (i have written this also at the link above) for amd ryzen 7 1700 asrock ab350m bios 4.4 and 4.5 opensuse tumbleweed (at the moment = 20180320-0) 1) check for a processor without high load bug ("kill ryzen script" has to be modified to use this sources: gcc7-7.1.1+r250819-1.1.src.rpm gmp-6.1.2-2.6.src.rpm isl-0.18-1.6.src.rpm mpc-1.0.3-1.114.src.rpm mpfr-3.1.5-4.2.src.rpm to run fine with opensuse tumbleweed) 2) use bios 4.4 or 4.5 (no higher at the moment available) set in bios: /advanced/amdcbs/zen-common-options/"power supply idel control" changed this from "auto" to "typical" 3) /etc/default/grub add at the end of GRUB_CMDLINE_LINUX_DEFAULT= pcie_aspm=off start: grub2-mkconfig -o /boot/grub2/grub.cfg for more details read the link. simoN www.becherer.de -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAEBAgAGBQJauTe0AAoJEOuDxDCJWQG+/+AP/0Hy91QSNZaqVVboB+PTmmq+ WQixY3+6eopJTXmejrtmDIvxt8oPjuomndmNiM/sQY1CLDpD6PEt0AtZzU+eWWLW UZbazx1dDMlm5foOHM6SpZiEo8+j6kBN9FQC7ieUEQyhu4mmFFp+5jDEpMem86jJ I8gWbPq6robm8mywsPEgrZWwQ3vJIH7hYLcZl2q7H3PZg1MzqEYRJ5hQpZ8ANbgk vMSFlFZJm41bnSKLbjRb+uy7Kdtc+e62ISoXlnasquR8cuQyi06O9/RIhbBaESG/ ktMnzpiUZk7XiCCcx+AIU2dgDt/BEyTcvgcdFTxC07ZNTXXZ7HAapHK9m2NU/ZBj 7jQCuGFB+P0PW9PCJ/4sI4IIVL1nuWk12cLvl8mgH7aW4YJiIgTbb78sGvRWKNR8 xLYkHDmL/a6O7cMambDFABGLWWLk7eH3gzHTz52U6J9n4v53079ERHVjDmsagj6q SPBk9H+jbagUu3/2G7ugQy5/YgFc5jTjeh2uB9glTop3vRFVjEbPKn4IV8pjRDOa Y9Yzo374O60MEhmvq2M228MsYeIGXa2O7uw8ltA5rrZr+VUMoEvnD3dtWYDUcgPM mWrEMhHJrjgQq+8pQRD2FZO9RLJRbLt/3q0rVw6AEeJCI8o7ECpAdhYZifzLN4Tu uLsLEngoLJ0JGAb7Z9cc =hXKz -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (4)
-
Daniele
-
Liam Proven
-
Robert Munteanu
-
Simon Becherer