Hi Alex, Am 08.01.2017 um 22:46 schrieb Alexander Graf:
On 07/01/2017 19:50, Josua Mayer wrote:
Hi everybody,
I am approaching you with a collection of unusual crashes that occur on my test machine: It is an early version of the 8040 Community Board by SolidRun running a patched 4.9.0 kernel with the 42.2 rootfs from http://download.opensuse.org/ports/aarch64/distribution/leap/42.2/appliances...
Now what I see are the following situations:
1) unhandled level XY translation fault (11) at 0xYYYYYYYY, esr 0xYYYYYYYY I have seen this with level 0, 1, 2 and 3 so far. So far I have seen it with zypper, where it freezes seemlinly at random, and ctrl+x produces this kind of error in dmesg.
That's really just a kernel log entry for a user space segmentation fault. The addresses that faulted were:
[ 2312.480811] zypper[5524]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x82000006
-->
ESR 0x82000006 means "Instruction Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0x00000000 (PC)
I see.
[ 2321.136185] zypper[9319]: unhandled level 2 translation fault (11) at 0xffffe5720449, esr 0x92000006
-->
ESR 0x92000006 means "Data Abort from a lower Exception level" Fault type: Translation fault. (in EL2) The faulting address is 0xffffe5720449 (x0)
I'm curious. You seem to be running in EL2. Do you have the following option enabled in your kernel?
CONFIG_ARM64_VHE
If so, please try to disable it and see whether that makes things work.
is indeed enabled, will comment below.
2) undefined instruction: pc=... This one causes a program to crash and exit without any delay. I actively observed this twice with zypper, but the kernel log suggests it can happen to other applications too.
Attached to this mail you can find a full system log with both kinds of crashes. You will find that I played with qemu when this log was produced. Sadly I did not save the initial log with the zypper crashes, except for one extract.
I am quite unsure what to do about this. Has anybody seen such behaviour with other boards?
I have seen it on a different CPU type with VHE enabled, yes.
I see
So after some additional tests I can conclude that it is some kernel option triggering the defect. I have one .config that works just fine, and one that is broken. Unfortunately the diff is not small. Both have VHE enabled, so I am not sure if that is the root cause. However I will try disabling VHE and let you know how it goes.
Alex