Am 26.11.24 um 00:22 schrieb Andrii Nakryiko:
Yes, stripped out ELF symbols are pretty annoying. But. That's a separate issue. And also it's possible to avoid needing them by capturing build ID and doing symbolization offline using build ID to lookup DWARF information for profiled executable/library.
The important part, again, is *capturing the stack trace*. You seem to be more worried about symbolization, which is an entirely different problem.
Isn't this entire proposal about the usability of profiling? I'm not just worried about symbolization, but about the bigger picture. Especially if we're profiling the entire system, we're easily talking about gigabytes of debug info to download.
With my compiler hat on, frame pointers often don't make sense. If there are no VLAs and allocas, the compiler knows how large a stack frame is and can simply add (for stacks that grow down) that fixed size in the epilogue. The frame pointer on the stack contains no information that the compiler doesn't already have.
Sure, from compiler point of view. But here the concern is profilers and other stack trace-based tools used for observability, profiling, and debugging.
My concern is not just profilers, since we're talking about a distribution default. The point that others have made and that I've just repeated here is that we're spending a register on a relatively register-sparse architecture for something that's not relevant for the program itself, but instrumentation for outside observers. And shipping with instrumentation by default just doesn't feel right. That is not a "real" argument, I know.
Profiling the workload is much more common than you might think, and as it becomes more accessible (because frame pointers are there and tools just work out of the box), it just will be even more frequently used. Even if user doesn't do its own performance investigation, original application author can send simple perf-based (or whatnot) commands to run and report back data for further optimization.
That's a bit hypothetical. In my experience, profiling is not even terribly common among C++ developers, which is to say that most teams have dedicated performance experts and most others rarely touch a profiler. Most developers are profiling their own applications, after just building them. That seems natural, after all you'll want to improve performance and for that you'll need to touch the source and recompile. Application authors or package maintainers asking their users for profiling sounds reasonable, but I haven't seen it yet. Realistically, if users complain about a performance problem, it's probably big enough that the overhead of DWARF and a reduced sample frequency are not an obstacle. And even truncated stacks could be enough for the developer to figure out the root cause. For what it's worth, I've just today profiled an awfully slow clang-tidy job and I could have attached a debugger to see what was wrong.
There is zero doubt that whatever we (Meta) lost due to disabling frame pointers has been recovered many times over through high quality and widely available profiling data.
Meta has a large paid workforce though, while this is at least in parts a community project. If SUSE wants to add FPs because they need it to improve the distro, I don't think anybody is going to stand in their way. But at least right now I don't think they have dedicated people for performance, and I'm not aware of anyone regularly doing this kind of work in the community. If they exist, please step up. All we have is a proposal that was copied from Fedora and no Tumbleweed user that said "I want this and this is how I would use it." Meta is also a data center operator, while SUSE mostly ships software to my knowledge. So they don't have a big server farm where they could or would want to do whole system profiling. Individual maintainers might do performance work on their packages, but for that they don't need FPs as a distro default. SUSE customers might want this for their deployments, but this is again speculation. If that is actually the case, I think someone from SUSE should tell us. My employer (a large SUSE customer I believe) would probably not be interested so much, because we mainly run our own binaries and use just the OS base in deployment.
Even within Fedora's ecosystem there were almost immediate reports on how having frame pointers in libc enabled enabled significant performance improvements.
Surely libc doesn't have deep call stacks, so I assume this is more of an awareness issue than deficits in DWARF unwinding? I also just played around with profiling a bit more in the last days than usual.