On Tue, Nov 26, 2024 at 7:54 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
Am 26.11.24 um 00:22 schrieb Andrii Nakryiko:
Yes, stripped out ELF symbols are pretty annoying. But. That's a separate issue. And also it's possible to avoid needing them by capturing build ID and doing symbolization offline using build ID to lookup DWARF information for profiled executable/library.
The important part, again, is *capturing the stack trace*. You seem to be more worried about symbolization, which is an entirely different problem.
Isn't this entire proposal about the usability of profiling? I'm not
The very first paragraph of this proposal asks to enable frame pointers "to support real-time profiling and observability". Profiling *and observability*. Not the same thing. Both are important. But whatever the wording, easily and cheaply available stack traces is the point here. What applications and tools do with them are orthogonal. No one can and should prescribe how those stack traces are to be used. Stack traces answer a question of "where in the code" (and "how did we get to that point"), everything else is up to applications and tools.
just worried about symbolization, but about the bigger picture. Especially if we're profiling the entire system, we're easily talking about gigabytes of debug info to download.
What does debug info have to do with this proposal? I already explained that stack unwinding and stack symbolization are two different problems. You don't need DWARF to make sense out of stack traces (though DWARF is extremely valuable to have even better stack traces, of course). Not having stripped ELF symbols would be needed, of course, but that's nothing in comparison to gigabytes of DWARF. And then again, symbolization doesn't even have to happen immediately or even on the same host. Build ID is there to enable such scenarios.
With my compiler hat on, frame pointers often don't make sense. If there are no VLAs and allocas, the compiler knows how large a stack frame is and can simply add (for stacks that grow down) that fixed size in the epilogue. The frame pointer on the stack contains no information that the compiler doesn't already have.
Sure, from compiler point of view. But here the concern is profilers and other stack trace-based tools used for observability, profiling, and debugging.
My concern is not just profilers, since we're talking about a distribution default. The point that others have made and that I've just repeated here is that we're spending a register on a relatively register-sparse architecture for something that's not relevant for the program itself, but instrumentation for outside observers. And shipping with instrumentation by default just doesn't feel right. That is not a "real" argument, I know.
It's a tradeoff, as everything in life. But supporters of frame pointers are arguing that overhead is negligible, and even if measurable in some cases, having stack traces (for profiling or other uses) easily accessible to tools is to everyone's benefit, and ultimately leads to more performance wins thanks to improved debuggability and observability of applications.
Profiling the workload is much more common than you might think, and as it becomes more accessible (because frame pointers are there and tools just work out of the box), it just will be even more frequently used. Even if user doesn't do its own performance investigation, original application author can send simple perf-based (or whatnot) commands to run and report back data for further optimization.
That's a bit hypothetical. In my experience, profiling is not even terribly common among C++ developers, which is to say that most teams have dedicated performance experts and most others rarely touch a profiler. Most developers are profiling their own applications, after just building them. That seems natural, after all you'll want to improve performance and for that you'll need to touch the source and recompile.
Application authors or package maintainers asking their users for profiling sounds reasonable, but I haven't seen it yet. Realistically, if users complain about a performance problem, it's probably big enough that the overhead of DWARF and a reduced sample frequency are not an obstacle. And even truncated stacks could be enough for the developer to figure out the root cause.
For what it's worth, I've just today profiled an awfully slow clang-tidy job and I could have attached a debugger to see what was wrong.
The more barriers to do something one has, the less likely it is that someone will attempt that thing. This very much applies to stack traces without frame pointers. Here are a few links I've collected some time after the Fedora proposal went through. From [0]: "Here is a little gem that I would have been unlikely to find without system-wide frame-pointers". From [1]: "My final summary here is that for most purposes you would be better off using frame pointers, and it’s a good thing that Fedora 38 now compiles everything with frame pointers. It should result in easier performance analysis, and even makes continuous performance analysis more plausible." Author of [1] also points out shortcomings of DWARF-based unwinding, btw. Like truncated ("detached") stack traces, overhead, etc. I never tried to collect every single blog post about the usefulness of frame pointer-based stack traces available system-wide, tbh. I have a few more links to internal posts which I can't, unfortunately, share. The theme is often the same: because it was trivial to get started, someone actually started, found something they wouldn't otherwise find, fixed it, and moved on. Note also that Apple has made the decision to always have frame pointers ([2]), probably having a good reason. And yeah, every Mac user "pays" for that, which doesn't seem to really be a problem in practice. [0] https://blogs.gnome.org/chergert/2023/10/03/what-have-frame-pointers-given-u... [1] https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/ [2] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple...
There is zero doubt that whatever we (Meta) lost due to disabling frame pointers has been recovered many times over through high quality and widely available profiling data.
Meta has a large paid workforce though, while this is at least in parts a community project. If SUSE wants to add FPs because they need it to improve the distro, I don't think anybody is going to stand in their way. But at least right now I don't think they have dedicated people for performance, and I'm not aware of anyone regularly doing this kind of work in the community.
If they exist, please step up. All we have is a proposal that was copied from Fedora and no Tumbleweed user that said "I want this and this is how I would use it."
I'll leave it up to the openSUSE community to decide on how reasonable it is to expect random distro users to find this thread, and present "things I'd do with stack traces" write ups to the community. In my experience things just don't work like that. You have to provide the means and, ideally, popularize and educate people, and only then you'll reap the benefits.
Meta is also a data center operator, while SUSE mostly ships software to my knowledge. So they don't have a big server farm where they could or would want to do whole system profiling. Individual maintainers might do performance work on their packages, but for that they don't need FPs as a distro default.
I'm not a package maintainer, but if I had to rebuild the whole world just to profile my application that uses glibc, I'd never even start. I actually was in similar situations where others asked me to help, and yeah, I didn't even start, because that's a bit too much to ask from me. But as I mentioned before and in another reply, it's not *just* about profiling. There is a whole cohort of tools like BCC tools [3] and various bpftrace-based scripts, many of which do rely on stack traces to help users understand the source of problems originating with applications (not necessarily applications written or maintained by those users). [3] https://github.com/iovisor/bcc/tree/master/tools
SUSE customers might want this for their deployments, but this is again speculation. If that is actually the case, I think someone from SUSE should tell us. My employer (a large SUSE customer I believe) would probably not be interested so much, because we mainly run our own binaries and use just the OS base in deployment.
Even within Fedora's ecosystem there were almost immediate reports on how having frame pointers in libc enabled enabled significant performance improvements.
Surely libc doesn't have deep call stacks, so I assume this is more of an awareness issue than deficits in DWARF unwinding? I also just played around with profiling a bit more in the last days than usual.
Depth of call stack is just one possible problem. Stack usage is absolutely different and orthogonal. You can have a call stack with just two functions, but if they use multi-KB stack variables, you'll blow your stack capture allowance very quickly. But do take a look at [1] I mentioned above.