Re: Proposal: Update default build flags to enable frame pointers

28 Nov 2024

      On Wed, Nov 27, 2024 at 5:57 PM Aaron Puchert
<aaronpuchert@alice-dsl.net> wrote:
...
Am 27.11.24 um 05:31 schrieb Andrii Nakryiko:
...
...
Am 26.11.24 um 00:22 schrieb Andrii Nakryiko:
...
Yes, stripped out ELF symbols are pretty annoying. But. That's a
separate issue. And also it's possible to avoid needing them by
capturing build ID and doing symbolization offline using build ID to
lookup DWARF information for profiled executable/library.
The important part, again, is *capturing the stack trace*. You seem
to be more worried about symbolization, which is an entirely different
problem.
Isn't this entire proposal about the usability of profiling? I'm not
just worried about symbolization, but about the bigger picture.
Especially if we're profiling the entire system, we're easily talking
about gigabytes of debug info to download.
What does debug info have to do with this proposal? I already
explained that stack unwinding and stack symbolization are two
different problems. You don't need DWARF to make sense out of stack
On Tue, Nov 26, 2024 at 7:54 PM Aaron Puchert
<aaronpuchert@alice-dsl.net> wrote:
traces (though DWARF is extremely valuable to have even better stack
traces, of course). Not having stripped ELF symbols would be needed,
of course, but that's nothing in comparison to gigabytes of DWARF.
Let me just summarize: I said "we need symbols", you said "that's a
separate issue", then I said "well but then you need debug info which
could be quite large", then you say "you don't need debug info if you
have symbols". So do you agree that we should have symbols or not?
Where is the contradiction? Yes, we need symbols. ELF symbols are
enough to have basic stack trace symbolization. But even that is not
an absolute requirement, because you can capture build ID + file
offset and offload symbolization to a different host and fetch full
ELF and DWARF data separately without affecting production workload.

But then again, this is symbolization issues, not stack unwinding.
...
We all understand that unwinding and symbolization are technically
different. But a bunch of hex numbers are not going to tell you
anything. So you'll need symbolization at some point, and what's the
point of making unwinding easy if symbolization remains hard?
Two separate problems. Different set of solutions. Nothing in common
between stack unwinding and symbolization besides the word "DWARF".
Absolutely misleading conversation and arguments. Not sure if
intentional or not, but it's just completely besides the topic of this
proposal.
...
...
...
...
There is zero doubt that whatever we (Meta) lost due to disabling frame
pointers has been recovered many times over through high quality and
widely available profiling data.
Meta has a large paid workforce though, while this is at least in parts
a community project. If SUSE wants to add FPs because they need it to
improve the distro, I don't think anybody is going to stand in their
way. But at least right now I don't think they have dedicated people for
performance, and I'm not aware of anyone regularly doing this kind of
work in the community.
If they exist, please step up. All we have is a proposal that was copied
from Fedora and no Tumbleweed user that said "I want this and this is
how I would use it."
I'll leave it up to the openSUSE community to decide on how reasonable
it is to expect random distro users to find this thread, and present
"things I'd do with stack traces" write ups to the community. In my
experience things just don't work like that. You have to provide the
means and, ideally, popularize and educate people, and only then
you'll reap the benefits.
...
Meta is also a data center operator, while SUSE mostly ships software to
my knowledge. So they don't have a big server farm where they could or
would want to do whole system profiling. Individual maintainers might do
performance work on their packages, but for that they don't need FPs as
a distro default.
I'm not a package maintainer, but if I had to rebuild the whole world
just to profile my application that uses glibc, I'd never even start.
I profile a lot and never rebuild the whole world or even just glibc.
But even if you use glibc or libstdc++ and for some reason spend a lot
of time in there that you want to understand. There is no need to
rebuild the whole world and rebuilding glibc is easy. Just check out the
package, add the flag and build. The build might take a bit, I don't
know. But the setup is one minute.
I believe you. But profiling something on your development machine is
just one of many possible scenarios. Rebuilding libc for a production
machine with a production workload that you *need* to profile and/or
debug is a completely different one.

You have an interesting line of counter-arguing here, tbh, using your
personal use cases and approaches as a general argument against a
change with much wider implications beyond your specific patterns.
...
...
...
...
Even within Fedora's ecosystem there were almost immediate reports on
how having frame pointers in libc enabled enabled significant performance
improvements.
Surely libc doesn't have deep call stacks, so I assume this is more of
an awareness issue than deficits in DWARF unwinding? I also just played
around with profiling a bit more in the last days than usual.
Depth of call stack is just one possible problem. Stack usage is
absolutely different and orthogonal.
Of course I meant the size in bytes.
Then I don't understand what you were trying to argue with "surely
libc doesn't have deep call stacks". What about stack usage of an
application itself that calls into libc? Why is libc called out
separately? Anyhow, I explained three broad approaches to DWARF-based
unwinding and their main limitations. I'm not sure I can add much to
that here.
...
Unfortunately [1] seems to use blunt instruments. I would also be mad at
a 60 GB trace file, but you can of course reduce sampling frequency.
There are formulas that tell you how many samples you need for which
level of accuracy, and if you want something like 1% error, you need
10,000 samples, which puts you at 320 MB for the stack traces. And if
your workload runs long enough, reducing the frequency can also reduce
overhead to acceptable levels.
I'm not suggesting that the current DWARF unwinding is perfect, and I'd
much rather see an unwinder that works in BPF. But it's not quite as bad
as some make it look.