On Wed, Nov 27, 2024 at 5:20 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
Am 27.11.24 um 20:15 schrieb Andrii Nakryiko:
1) On-the-fly unwinding based on .eh_frame DWARF data. Given it's pretty big and is normally not used by the application, chances are high that the contents of this section won't be physically mapped into memory. So to access it one would need to cause page fault and wait for the kernel to fulfill it. It's slower, but the real issue is that stack traces with BPF and perf are captured in non-sleepable context (NMI due to perf event, or tracepoints, or kprobes, none of which allow page faults). So you'll, at the minimum, run into the problem with .eh_frame effectively not being available when necessary.
So we would just need to find a way to fault all .eh_frame pages of affected processes into memory before we start sampling? My very limited understanding is that the perf_event_open API (or is it just perf?) reports the start of processes and mapping of DSOs (PERF_RECORD_MMAP?), so couldn't one use that opportunity to fault the range in? Or am I overlooking something?
You are overlooking *a lot* here. There is nothing (even conceptually) simple here, unfortunately. And you are not the first one who'd like to solve these problems. There are tons of technical and conceptual problems here, none of which are easily solvable. Not because no one tried, but because there are fundamental and technical (and sometimes even political) issues.