Hello, On Thu, Nov 28, 2024 at 09:21:22AM +0100, Richard Biener wrote:
On Wed, 27 Nov 2024, Andrii Nakryiko wrote:
On Wed, Nov 27, 2024 at 6:42 AM Richard Biener <rguenther@suse.de> wrote:
On Tue, 26 Nov 2024, Andrii Nakryiko wrote:
3) Stack contents capture. That's what perf supports. Capture some relatively large portion of the current thread's stack, hoping to capture enough. Post-process afterwards in user space, doing unwinding using captured snapshot of the stack and .eh_frame. Main issues: whatever amount of stack you captured, might not be enough. And that depends entirely on specific functions that were active during the snapshot. If some function has lots of local variables stored on stack, it might use up the entire captured stack, and so you won't even find other frames. And it's impossible to predict and mitigate. And, of course, it's expensive to copy so much memory between kernel and user space, and .eh_frame-based processing is still relatively slow.
There is just no good and reliable solution for DWARF-based stack unwinding.
The issue is FP based unwinding cannot _ever_ be reliable as you can't know whether you have a valid FP and all FP based unwinders have to apply heuristics here. You'd have to consult .eh_frame to see whether you have a frame pointer.
Yeah, so FP based unwinding is heuristically "fast" - when it works. But you don't get to know whether it does or not ;)
It sounds like this is the case of the perfect getting in the way of good. Profiling is all statistics, and never 100% reliable. What level of unreliability is there with frame pointers? It has been pointed out that with the current state of the tools and overall ecosystem frame pointers help a lot of people a lot of the time for a number of different use cases that eventually result in improvements for all users. Sounds like a useful feature, even if not perfect. Thanks Michal