[opensuse-arm] weird userspace crashes with 42.2 aarch64
Hi everybody, I am approaching you with a collection of unusual crashes that occur on my test machine: It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs from http://download.opensuse.org/ports/aarch64/distribution/leap/42.2/appliances... Now what I see are the following situations: 1) unhandled level XY translation fault (11) at 0xYYYYYYYY, esr 0xYYYYYYYY I have seen this with level 0, 1, 2 and 3 so far. So far I have seen it with zypper, where it freezes seemlinly at random, and ctrl+x produces this kind of error in dmesg. 2) undefined instruction: pc=... This one causes a program to crash and exit without any delay. I actively observed this twice with zypper, but the kernel log suggests it can happen to other applications too. Attached to this mail you can find a full system log with both kinds of crashes. You will find that I played with qemu when this log was produced. Sadly I did not save the initial log with the zypper crashes, except for one extract. I am quite unsure what to do about this. Has anybody seen such behaviour with other boards? br Josua Mayer
On 07/01/2017 19:50, Josua Mayer wrote:
Hi everybody,
I am approaching you with a collection of unusual crashes that occur on my test machine: It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs from http://download.opensuse.org/ports/aarch64/distribution/leap/42.2/appliances...
Now what I see are the following situations:
1) unhandled level XY translation fault (11) at 0xYYYYYYYY, esr 0xYYYYYYYY I have seen this with level 0, 1, 2 and 3 so far. So far I have seen it with zypper, where it freezes seemlinly at random, and ctrl+x produces this kind of error in dmesg.
That's really just a kernel log entry for a user space segmentation fault. The addresses that faulted were: [ 2312.480811] zypper[5524]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x82000006 --> ESR 0x82000006 means "Instruction Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0x00000000 (PC) [ 2321.136185] zypper[9319]: unhandled level 2 translation fault (11) at 0xffffe5720449, esr 0x92000006 --> ESR 0x92000006 means "Data Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0xffffe5720449 (x0) I'm curious. You seem to be running in EL2. Do you have the following option enabled in your kernel? CONFIG_ARM64_VHE If so, please try to disable it and see whether that makes things work.
2) undefined instruction: pc=... This one causes a program to crash and exit without any delay. I actively observed this twice with zypper, but the kernel log suggests it can happen to other applications too.
Attached to this mail you can find a full system log with both kinds of crashes. You will find that I played with qemu when this log was produced. Sadly I did not save the initial log with the zypper crashes, except for one extract.
I am quite unsure what to do about this. Has anybody seen such behaviour with other boards?
I have seen it on a different CPU type with VHE enabled, yes. Alex -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Alex, Am 08.01.2017 um 22:46 schrieb Alexander Graf:
On 07/01/2017 19:50, Josua Mayer wrote:
Hi everybody,
I am approaching you with a collection of unusual crashes that occur on my test machine: It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs from http://download.opensuse.org/ports/aarch64/distribution/leap/42.2/appliances...
Now what I see are the following situations:
1) unhandled level XY translation fault (11) at 0xYYYYYYYY, esr 0xYYYYYYYY I have seen this with level 0, 1, 2 and 3 so far. So far I have seen it with zypper, where it freezes seemlinly at random, and ctrl+x produces this kind of error in dmesg.
That's really just a kernel log entry for a user space segmentation fault. The addresses that faulted were:
[ 2312.480811] zypper[5524]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x82000006
-->
ESR 0x82000006 means "Instruction Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0x00000000 (PC)
I see.
[ 2321.136185] zypper[9319]: unhandled level 2 translation fault (11) at 0xffffe5720449, esr 0x92000006
-->
ESR 0x92000006 means "Data Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0xffffe5720449 (x0)
I'm curious. You seem to be running in EL2. Do you have the following option enabled in your kernel?
CONFIG_ARM64_VHE
If so, please try to disable it and see whether that makes things work.
is indeed enabled, will comment below.
2) undefined instruction: pc=... This one causes a program to crash and exit without any delay. I actively observed this twice with zypper, but the kernel log suggests it can happen to other applications too.
Attached to this mail you can find a full system log with both kinds of crashes. You will find that I played with qemu when this log was produced. Sadly I did not save the initial log with the zypper crashes, except for one extract.
I am quite unsure what to do about this. Has anybody seen such behaviour with other boards?
I have seen it on a different CPU type with VHE enabled, yes.
I see
So after some additional tests I can conclude that it is some kernel option triggering the defect. I have one .config that works just fine, and one that is broken. Unfortunately the diff is not small. Both have VHE enabled, so I am not sure if that is the root cause. However I will try disabling VHE and let you know how it goes.
Alex
On 10/01/17 15:08, Josua Mayer wrote:
Hi Alex,
Am 08.01.2017 um 22:46 schrieb Alexander Graf:
On 07/01/2017 19:50, Josua Mayer wrote:
Hi everybody,
I am approaching you with a collection of unusual crashes that occur on my test machine: It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs from http://download.opensuse.org/ports/aarch64/distribution/leap/42.2/appliances...
Now what I see are the following situations:
1) unhandled level XY translation fault (11) at 0xYYYYYYYY, esr 0xYYYYYYYY I have seen this with level 0, 1, 2 and 3 so far. So far I have seen it with zypper, where it freezes seemlinly at random, and ctrl+x produces this kind of error in dmesg.
That's really just a kernel log entry for a user space segmentation fault. The addresses that faulted were:
[ 2312.480811] zypper[5524]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x82000006
-->
ESR 0x82000006 means "Instruction Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0x00000000 (PC)
I see.
[ 2321.136185] zypper[9319]: unhandled level 2 translation fault (11) at 0xffffe5720449, esr 0x92000006
-->
ESR 0x92000006 means "Data Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0xffffe5720449 (x0)
I'm curious. You seem to be running in EL2. Do you have the following option enabled in your kernel?
CONFIG_ARM64_VHE
If so, please try to disable it and see whether that makes things work.
is indeed enabled, will comment below.
2) undefined instruction: pc=... This one causes a program to crash and exit without any delay. I actively observed this twice with zypper, but the kernel log suggests it can happen to other applications too.
Attached to this mail you can find a full system log with both kinds of crashes. You will find that I played with qemu when this log was produced. Sadly I did not save the initial log with the zypper crashes, except for one extract.
I am quite unsure what to do about this. Has anybody seen such behaviour with other boards?
I have seen it on a different CPU type with VHE enabled, yes.
I see
So after some additional tests I can conclude that it is some kernel option triggering the defect. I have one .config that works just fine, and one that is broken. Unfortunately the diff is not small. Both have VHE enabled, so I am not sure if that is the root cause. However I will try disabling VHE and let you know how it goes.
From my experience I would start by changing the page size in your broken config. Regards, Matthias -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi everyone, sorry for the delay on this. We dropped this kernel tree and moved on to 4.10-rc6. So far I have not had any issues there. Except: [ 8305.931816] cc1[31947]: unhandled level 0 translation fault (11) at 0x10004df5cfc19, esr 0x92000004 But this time inside a VM running debian arm64. The host seems untroubled by this. So for now, this story is closed. As to the errors inside the guest for now I will ignore them, but if necessary Debian seems to be the place to ask about it. Am 10.01.2017 um 16:12 schrieb Matthias Brugger:
On 10/01/17 15:08, Josua Mayer wrote:
Hi Alex,
Am 08.01.2017 um 22:46 schrieb Alexander Graf:
On 07/01/2017 19:50, Josua Mayer wrote:
Hi everybody,
I am approaching you with a collection of unusual crashes that occur on my test machine: It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs from http://download.opensuse.org/ports/aarch64/distribution/leap/42.2/appliances...
Now what I see are the following situations:
1) unhandled level XY translation fault (11) at 0xYYYYYYYY, esr 0xYYYYYYYY I have seen this with level 0, 1, 2 and 3 so far. So far I have seen it with zypper, where it freezes seemlinly at random, and ctrl+x produces this kind of error in dmesg.
That's really just a kernel log entry for a user space segmentation fault. The addresses that faulted were:
[ 2312.480811] zypper[5524]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x82000006
-->
ESR 0x82000006 means "Instruction Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0x00000000 (PC)
I see.
[ 2321.136185] zypper[9319]: unhandled level 2 translation fault (11) at 0xffffe5720449, esr 0x92000006
-->
ESR 0x92000006 means "Data Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0xffffe5720449 (x0)
I'm curious. You seem to be running in EL2. Do you have the following option enabled in your kernel?
CONFIG_ARM64_VHE
If so, please try to disable it and see whether that makes things work.
is indeed enabled, will comment below.
2) undefined instruction: pc=... This one causes a program to crash and exit without any delay. I actively observed this twice with zypper, but the kernel log suggests it can happen to other applications too.
Attached to this mail you can find a full system log with both kinds of crashes. You will find that I played with qemu when this log was produced. Sadly I did not save the initial log with the zypper crashes, except for one extract.
I am quite unsure what to do about this. Has anybody seen such behaviour with other boards?
I have seen it on a different CPU type with VHE enabled, yes.
I see
So after some additional tests I can conclude that it is some kernel option triggering the defect. I have one .config that works just fine, and one that is broken. Unfortunately the diff is not small. Both have VHE enabled, so I am not sure if that is the root cause. However I will try disabling VHE and let you know how it goes.
From my experience I would start by changing the page size in your broken config.
Regards, Matthias
-- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
On 01/07/2017 07:50 PM, Josua Mayer wrote:
Hi everybody,
I am approaching you with a collection of unusual crashes that occur on my test machine: It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs
Hi, What are your experiences with this board? Does it run with an upstream kernel or still needs many patches? Is it SBSA compliant? I'm asking, because I can't decide between two ARMv8 systems: - SoftIron's Overdrive 1000 has rock solid Linux support and comes completely assembled. Both are a huge bonus, as I'm too old and too busy to fetch parts and assemble it myself. On the other hand it's quite limited on ports. - SolidRuns Community board needs a lot of work to get it running, on the other hand it has more SATA and Ethernet ports, expandable via PCIe. Bye, CzP -- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
Hi Peter, Am 25.01.2017 um 12:36 schrieb Peter Czanik:
On 01/07/2017 07:50 PM, Josua Mayer wrote:
It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs Hi,
What are your experiences with this board? Does it run with an upstream kernel or still needs many patches? I have not checked with upstream on which components are still missing. However we are fortunate enough to have Russell King spend time on working it out. He has a "mcbin" branch on his git tree that might give you an idea on what the status is. From my point 4.10-rc6 works reasonably well, however I have already found a bug stopping the marvell xor v2 driver from working.
Is it SBSA compliant? I have no idea. I had a quick look and uefi appears to be a key component of this specification. The board runs u-boot from either spi flash, emmc, sdcard or sata. So it should be doable, but the current u-boot binary I got does not come with any of the new efi features of u-boot.
I'm asking, because I can't decide between two ARMv8 systems: - SoftIron's Overdrive 1000 has rock solid Linux support and comes completely assembled. Both are a huge bonus, as I'm too old and too busy to fetch parts and assemble it myself. On the other hand it's quite limited on ports. - SolidRuns Community board needs a lot of work to get it running, on the other hand it has more SATA and Ethernet ports, expandable via PCIe.
Bye, CzP
-- To unsubscribe, e-mail: opensuse-arm+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-arm+owner@opensuse.org
participants (4)
-
Alexander Graf
-
Josua Mayer
-
Matthias Brugger
-
Peter Czanik