Re: Proposal: Update default build flags to enable frame pointers

27 Nov 2024

      On Tue, Nov 26, 2024 at 7:54 PM Aaron Puchert
<aaronpuchert@alice-dsl.net> wrote:
...
Am 26.11.24 um 00:22 schrieb Andrii Nakryiko:
...
Yes, stripped out ELF symbols are pretty annoying. But. That's a
separate issue. And also it's possible to avoid needing them by
capturing build ID and doing symbolization offline using build ID to
lookup DWARF information for profiled executable/library.
The important part, again, is *capturing the stack trace*. You seem
to be more worried about symbolization, which is an entirely different
problem.
Isn't this entire proposal about the usability of profiling? I'm not
The very first paragraph of this proposal asks to enable frame
pointers "to support real-time profiling and observability". Profiling
*and observability*. Not the same thing. Both are important.

But whatever the wording, easily and cheaply available stack traces is
the point here. What applications and tools do with them are
orthogonal. No one can and should prescribe how those stack traces are
to be used. Stack traces answer a question of "where in the code" (and
"how did we get to that point"), everything else is up to applications
and tools.
...
just worried about symbolization, but about the bigger picture.
Especially if we're profiling the entire system, we're easily talking
about gigabytes of debug info to download.
What does debug info have to do with this proposal? I already
explained that stack unwinding and stack symbolization are two
different problems. You don't need DWARF to make sense out of stack
traces (though DWARF is extremely valuable to have even better stack
traces, of course). Not having stripped ELF symbols would be needed,
of course, but that's nothing in comparison to gigabytes of DWARF.

And then again, symbolization doesn't even have to happen immediately
or even on the same host. Build ID is there to enable such scenarios.
...
...
...
With my compiler hat on, frame pointers often don't make sense. If there
are no VLAs and allocas, the compiler knows how large a stack frame is
and can simply add (for stacks that grow down) that fixed size in the
epilogue. The frame pointer on the stack contains no information that
the compiler doesn't already have.
Sure, from compiler point of view. But here the concern is profilers and
other stack trace-based tools used for observability, profiling, and
debugging.
My concern is not just profilers, since we're talking about a
distribution default. The point that others have made and that I've just
repeated here is that we're spending a register on a relatively
register-sparse architecture for something that's not relevant for the
program itself, but instrumentation for outside observers. And shipping
with instrumentation by default just doesn't feel right. That is not a
"real" argument, I know.
It's a tradeoff, as everything in life. But supporters of frame
pointers are arguing that overhead is negligible, and even if
measurable in some cases, having stack traces (for profiling or other
uses) easily accessible to tools is to everyone's benefit, and
ultimately leads to more performance wins thanks to improved
debuggability and observability of applications.
...
...
Profiling the workload is much more common than you might think, and
as it becomes more accessible (because frame pointers are there and
tools just work out of the box), it just will be even more frequently
used. Even if user doesn't do its own performance investigation,
original application author can send simple perf-based (or whatnot)
commands to run and report back data for further optimization.
That's a bit hypothetical. In my experience, profiling is not even
terribly common among C++ developers, which is to say that most teams
have dedicated performance experts and most others rarely touch a
profiler. Most developers are profiling their own applications, after
just building them. That seems natural, after all you'll want to improve
performance and for that you'll need to touch the source and recompile.
Application authors or package maintainers asking their users for
profiling sounds reasonable, but I haven't seen it yet. Realistically,
if users complain about a performance problem, it's probably big enough
that the overhead of DWARF and a reduced sample frequency are not an
obstacle. And even truncated stacks could be enough for the developer to
figure out the root cause.
For what it's worth, I've just today profiled an awfully slow clang-tidy
job and I could have attached a debugger to see what was wrong.
The more barriers to do something one has, the less likely it is that
someone will attempt that thing. This very much applies to stack
traces without frame pointers. Here are a few links I've collected
some time after the Fedora proposal went through.

From [0]: "Here is a little gem that I would have been unlikely to
find without system-wide frame-pointers".

From [1]: "My final summary here is that for most purposes you would
be better off using frame pointers, and it’s a good thing that Fedora
38 now compiles everything with frame pointers. It should result in
easier performance analysis, and even makes continuous performance
analysis more plausible."

Author of [1] also points out shortcomings of DWARF-based unwinding,
btw. Like truncated ("detached") stack traces, overhead, etc.

I never tried to collect every single blog post about the usefulness
of frame pointer-based stack traces available system-wide, tbh. I have
a few more links to internal posts which I can't, unfortunately,
share. The theme is often the same: because it was trivial to get
started, someone actually started, found something they wouldn't
otherwise find, fixed it, and moved on.

Note also that Apple has made the decision to always have frame
pointers ([2]), probably having a good reason. And yeah, every Mac
user "pays" for that, which doesn't seem to really be a problem in
practice.

[0] https://blogs.gnome.org/chergert/2023/10/03/what-have-frame-pointers-given-u...
[1] https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/
[2] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple...
...
...
There is zero doubt that whatever we (Meta) lost due to disabling frame
pointers has been recovered many times over through high quality and
widely available profiling data.
Meta has a large paid workforce though, while this is at least in parts
a community project. If SUSE wants to add FPs because they need it to
improve the distro, I don't think anybody is going to stand in their
way. But at least right now I don't think they have dedicated people for
performance, and I'm not aware of anyone regularly doing this kind of
work in the community.
If they exist, please step up. All we have is a proposal that was copied
from Fedora and no Tumbleweed user that said "I want this and this is
how I would use it."
I'll leave it up to the openSUSE community to decide on how reasonable
it is to expect random distro users to find this thread, and present
"things I'd do with stack traces" write ups to the community. In my
experience things just don't work like that. You have to provide the
means and, ideally, popularize and educate people, and only then
you'll reap the benefits.
...
Meta is also a data center operator, while SUSE mostly ships software to
my knowledge. So they don't have a big server farm where they could or
would want to do whole system profiling. Individual maintainers might do
performance work on their packages, but for that they don't need FPs as
a distro default.
I'm not a package maintainer, but if I had to rebuild the whole world
just to profile my application that uses glibc, I'd never even start.
I actually was in similar situations where others asked me to help,
and yeah, I didn't even start, because that's a bit too much to ask
from me.

But as I mentioned before and in another reply, it's not *just* about
profiling. There is a whole cohort of tools like BCC tools [3] and
various bpftrace-based scripts, many of which do rely on stack traces
to help users understand the source of problems originating with
applications (not necessarily applications written or maintained by
those users).

[3] https://github.com/iovisor/bcc/tree/master/tools
...
SUSE customers might want this for their deployments, but this is again
speculation. If that is actually the case, I think someone from SUSE
should tell us. My employer (a large SUSE customer I believe) would
probably not be interested so much, because we mainly run our own
binaries and use just the OS base in deployment.
...
Even within Fedora's ecosystem there were almost immediate reports on
how having frame pointers in libc enabled enabled significant performance
improvements.
Surely libc doesn't have deep call stacks, so I assume this is more of
an awareness issue than deficits in DWARF unwinding? I also just played
around with profiling a bit more in the last days than usual.
Depth of call stack is just one possible problem. Stack usage is
absolutely different and orthogonal. You can have a call stack with
just two functions, but if they use multi-KB stack variables, you'll
blow your stack capture allowance very quickly. But do take a look at
[1] I mentioned above.