Aaron Puchert wrote:
Am 16.11.24 um 19:24 schrieb Neal Gompa:
I would like to change openSUSE's default build flags in rpm and rpm-config-SUSE to incorporate frame pointers (and equivalents on other architectures) to support real-time profiling and observability on SUSE distributions. Profiling doesn't necessarily need call stacks. Often you can learn a great deal from just seeing the hot functions. I'd say that I work with
"Doesn't necessarily need call stacks" doesn't mean that one doesn't need call stacks. I'd say call stacks (stack traces) are more often than not useful and provide tons of contextual information that pure hot function view just can't. And the more complicated the application the more they are important.
plain profiles ("perf record" without -g) most of the time and add call stacks only when I'm unsure why a function is so frequently called. Stacks can sometimes even obscure hotspots, depending on how you're reading them: if a hot function distributes its cost among many different call stacks, it will not stick out in a top-down flame graph. (You'll have to do a bottom-up.)
Flame graphs is not the only way to look at the collection of stack traces, so I'm not sure all the above are reasonable objections. For instance, here at Meta, we use both flame graphs and an alternative so called "graph profiler" view, which allows you to get extremely detailed and customizable view and slicing of profiles. Aggregating across many calls, filtering out some of the call paths reaching some function, while keeping others, etc. But first and foremost is getting the stack traces data. That's what frame pointer proposal is trying to enable globally and without users having to jump through unreasonable hoops.
There is another obstacle, one that applies even to profiling without call stacks: there are no symbols. We strip .symtab along with debug info. Without that, the stacks that you get are meaningless.
Yes, stripped out ELF symbols are pretty annoying. But. That's a separate issue. And also it's possible to avoid needing them by capturing build ID and doing symbolization offline using build ID to lookup DWARF information for profiled executable/library. The important part, again, is *capturing the stack trace*. You seem to be more worried about symbolization, which is an entirely different problem.
With my compiler hat on, frame pointers often don't make sense. If there are no VLAs and allocas, the compiler knows how large a stack frame is and can simply add (for stacks that grow down) that fixed size in the epilogue. The frame pointer on the stack contains no information that the compiler doesn't already have.
Sure, from compiler point of view. But here the concern is profilers and other stack trace-based tools used for observability, profiling, and debugging.
* The performance hit for having it vs not is insignificant[6]. This is a tricky argument. There are lots of things one could do that individually have little impact (maybe 1–10%), but those little things add up. One guy wants to add frame pointers for profiling, another wants stack protectors, the next guy wants automatic initialization of local variables (likely C++26) or mandatory boundary checks. They all claim that it adds just a little (on average).
But what if the benefit is also small? If we take the average cost (which is typically small for most things you can add) we should also take the average benefit. That might not be much larger since lots of people, even Linux users, will never run "perf record". Even among developers, profiling might be restricted to self-built binaries.
This has been discussed ad nauseam in https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer and elsewhere. Profiling the workload is much more common than you might think, and as it becomes more accessible (because frame pointers are there and tools just work out of the box), it just will be even more frequently used. Even if user doesn't do its own performance investigation, original application author can send simple perf-based (or whatnot) commands to run and report back data for further optimization. And cumulatively, the gains we get from more available profiling data far outweight any of the potential one-time regression due to frame pointers. There is zero doubt that whatever we (Meta) lost due to disabling frame pointers has been recovered many times over through high quality and widely available profiling data. Even within Fedora's ecosystem there were almost immediate reports on how having frame pointers in libc enabled enabled significant performance improvements (I'm too lazy to look for the links, please don't ask).
I want openSUSE to be a great place for people to develop and optimize workloads on, especially desktop ones, where most of the tooling we have for tracing and profiling is broken without frame pointers (see Sysprof and Hotspot from GNOME and KDE respectively, which both rely on frame pointers to have cheap real-time tracing for performance analysis). I don't know about any of those, but I'd assume they're just GUIs around "perf"? If you have an Intel CPU since Haswell (Zen 4 or so should also have LBR, but I haven't tried it yet), "perf record --call-graph lbr" works even without frame pointers, and in my experience pretty reliable.
There are many profilers and various tools (especially BPF-based) that have nothing to do with perf, so no, I don't think one can just generalize to "perf UI". And as for LBR, we tried LBR as an augmentation for stack traces to get through functions calls inside some libraries that we don't control and couldn't enable frame pointers for. It doesn't work all that great in practice, unfortunately. LBR is limited to just 16 or 32 entires, and that's often not enough. But also LBR doesn't record stack trace, it records a trace of function returns (and other stuff, if you set it up correctly), and that's not 100% compatible with stack traces. LBRs have some exciting uses (I have a tool, retsnoop, that benefits a lot from LBR, but for entirely different use case), but it's certainly not a replacement for frame pointers and stack traces.