Proposal: Update default build flags to enable frame pointers
Hello, I would like to change openSUSE's default build flags in rpm and rpm-config-SUSE to incorporate frame pointers (and equivalents on other architectures) to support real-time profiling and observability on SUSE distributions. In practice, this would mean essentially adopting the same tunables that exist in Fedora[1] for openSUSE to turn them on by default and to allow packages or OBS projects to selectively opt out as needed. The reasons to do so are threefold: * It is a major competitive disadvantage for us to lack the ability to do cheap real-time profiling and observability. Fedora[2], Ubuntu[3], Arch Linux[4], and AlmaLinux[5] all now do this, and thus support this capability. * The performance hit for having it vs not is insignitifcant[6]. * There are new tools in both the cloud-native and regular systems development worlds that leverage this, and openSUSE should be an enabler of those technologies. I want openSUSE to be a great place for people to develop and optimize workloads on, especially desktop ones, where most of the tooling we have for tracing and profiling is broken without frame pointers (see Sysprof and Hotspot from GNOME and KDE respectively, which both rely on frame pointers to have cheap real-time tracing for performance analysis). And given that we advertise as the "makers' choice", I think it would definitely be on-brand for us to have the capability to better support makers and shakers. For those interested in more detail about frame pointers, Brendan Gregg has a decent post about it[7]. I have also submitted a parallel request for openSUSE Leap 16 to have this feature enabled too[8]. I truly believe this would give us an even better footing with the broader community of developers and operators and make SUSE distributions very attractive for FOSS and proprietary software alike. Best regards, Neal [1]: https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/93063bb396395b9a20... [2]: https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer [3]: https://ubuntu.com/blog/ubuntu-performance-engineering-with-frame-pointers-b... [4]: https://gitlab.archlinux.org/archlinux/rfcs/-/blob/master/rfcs/0026-fno-omit... [5]: https://almalinux.org/blog/2024-10-22-introducing-almalinux-os-kitten/ [6]: https://www.phoronix.com/review/fedora-38-beta-benchmarks [7]: https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointer... [8]: https://code.opensuse.org/leap/features/issue/175 -- 真実はいつも一つ!/ Always, there's only one truth!
On Sat, 16 Nov 2024, Neal Gompa wrote:
Hello,
I would like to change openSUSE's default build flags in rpm and rpm-config-SUSE to incorporate frame pointers (and equivalents on other architectures) to support real-time profiling and observability on SUSE distributions.
In practice, this would mean essentially adopting the same tunables that exist in Fedora[1] for openSUSE to turn them on by default and to allow packages or OBS projects to selectively opt out as needed.
The reasons to do so are threefold:
* It is a major competitive disadvantage for us to lack the ability to do cheap real-time profiling and observability. Fedora[2], Ubuntu[3], Arch Linux[4], and AlmaLinux[5] all now do this, and thus support this capability. * The performance hit for having it vs not is insignitifcant[6]. * There are new tools in both the cloud-native and regular systems development worlds that leverage this, and openSUSE should be an enabler of those technologies.
I want openSUSE to be a great place for people to develop and optimize workloads on, especially desktop ones, where most of the tooling we have for tracing and profiling is broken without frame pointers (see Sysprof and Hotspot from GNOME and KDE respectively, which both rely on frame pointers to have cheap real-time tracing for performance analysis). And given that we advertise as the "makers' choice", I think it would definitely be on-brand for us to have the capability to better support makers and shakers.
For those interested in more detail about frame pointers, Brendan Gregg has a decent post about it[7]. I have also submitted a parallel request for openSUSE Leap 16 to have this feature enabled too[8].
I truly believe this would give us an even better footing with the broader community of developers and operators and make SUSE distributions very attractive for FOSS and proprietary software alike.
It seems this is optimizing the system for profiling it rather than using it which is an odd thing to do. I realize not having frame pointers can make accurate profiling more difficult (but I do this every day), still taking a 1-10% hit on cpython seems bad. I also object to enforce this for x86 32bit which is a very register starved architecture (x86-64 is only slightly better in this regard). I'll note -mno-omit-leaf-frame-pointer is x86 specific - is the proposal only directed to x86 and x86-64? How do you enforce frame pointers for JITed code, for code generated by compilers that are not GCC (rust, golang, etc.)? Or why do you choose to "ignore" profiling those? In any case - I propose shipping all packages with debug info included since that greatly improves the profiling experience - even more so than by enabling frame-pointers. Bandwidth and disk is cheap these days. Thanks, Richard.
Best regards, Neal
[1]: https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/93063bb396395b9a20... [2]: https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer [3]: https://ubuntu.com/blog/ubuntu-performance-engineering-with-frame-pointers-b... [4]: https://gitlab.archlinux.org/archlinux/rfcs/-/blob/master/rfcs/0026-fno-omit... [5]: https://almalinux.org/blog/2024-10-22-introducing-almalinux-os-kitten/ [6]: https://www.phoronix.com/review/fedora-38-beta-benchmarks [7]: https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointer... [8]: https://code.opensuse.org/leap/features/issue/175
-- 真実はいつも一つ!/ Always, there's only one truth!
-- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Monday 2024-11-18 09:30, Richard Biener wrote:
In any case - I propose shipping all packages with debug info included since that greatly improves the profiling experience - even more so than by enabling frame-pointers. Bandwidth and disk is cheap these days.
I wish people would stop with the "bandwidth/disk is cheap" nonsense. Tumbleweed is in the same size class as some Call Of Duty episode. Memes aside, it still downloads in roughly the same time as a SUSE installation did 20 years ago. And there was a technology shift where disk sizes experienced a massive crunch in favor of speed. Buy a machine today, and you get the same disk size as in 2007.
On Mon, Nov 18, 2024 at 3:30 AM Richard Biener <rguenther@suse.de> wrote:
On Sat, 16 Nov 2024, Neal Gompa wrote:
Hello,
I would like to change openSUSE's default build flags in rpm and rpm-config-SUSE to incorporate frame pointers (and equivalents on other architectures) to support real-time profiling and observability on SUSE distributions.
In practice, this would mean essentially adopting the same tunables that exist in Fedora[1] for openSUSE to turn them on by default and to allow packages or OBS projects to selectively opt out as needed.
The reasons to do so are threefold:
* It is a major competitive disadvantage for us to lack the ability to do cheap real-time profiling and observability. Fedora[2], Ubuntu[3], Arch Linux[4], and AlmaLinux[5] all now do this, and thus support this capability. * The performance hit for having it vs not is insignitifcant[6]. * There are new tools in both the cloud-native and regular systems development worlds that leverage this, and openSUSE should be an enabler of those technologies.
I want openSUSE to be a great place for people to develop and optimize workloads on, especially desktop ones, where most of the tooling we have for tracing and profiling is broken without frame pointers (see Sysprof and Hotspot from GNOME and KDE respectively, which both rely on frame pointers to have cheap real-time tracing for performance analysis). And given that we advertise as the "makers' choice", I think it would definitely be on-brand for us to have the capability to better support makers and shakers.
For those interested in more detail about frame pointers, Brendan Gregg has a decent post about it[7]. I have also submitted a parallel request for openSUSE Leap 16 to have this feature enabled too[8].
I truly believe this would give us an even better footing with the broader community of developers and operators and make SUSE distributions very attractive for FOSS and proprietary software alike.
It seems this is optimizing the system for profiling it rather than using it which is an odd thing to do. I realize not having frame pointers can make accurate profiling more difficult (but I do this every day), still taking a 1-10% hit on cpython seems bad.
It's not just about making accurate profiling easier, it's also about making it cheap. Frame pointers make it so you can sample at any point on a running system. Users can do sampling as they observe problems and report to developers. Developers and operators can observe with real workloads without impacting the system configuration. This is why the Go compiler has had frame pointers on by default for 6 years. It's also why other operating systems have frame pointers on for non-x86_32. Linux is the outlier.
I also object to enforce this for x86 32bit which is a very register starved architecture (x86-64 is only slightly better in this regard).
Sure, we can leave it out by default for i586/x86_32.
I'll note -mno-omit-leaf-frame-pointer is x86 specific - is the proposal only directed to x86 and x86-64?
No. As I said, the idea is to use equivalent flags on all supported architectures.
How do you enforce frame pointers for JITed code, for code generated by compilers that are not GCC (rust, golang, etc.)? Or why do you choose to "ignore" profiling those?
Go already does this and has for most of the past decade (which is why cloud-native observability tools even exist at all). I'd like to turn it on in Rust as well. In Fedora, it *was* turned on in Rust, controlled by the same rpm macro that affected the compiler flags for relevant architectures[1]. [1]: https://pagure.io/fedora-rust/rust-packaging/blob/1402e757e3200e6f06b8a9c0db...
In any case - I propose shipping all packages with debug info included since that greatly improves the profiling experience - even more so than by enabling frame-pointers. Bandwidth and disk is cheap these days.
Actually, DWARF based profiling isn't very good, which is why so few people do it. It's slow, it's memory intensive, thus the sampling capability is much poorer. Richard W.M. Jones did a decent comparison about this and the problems with DWARF profiling over leveraging frame pointers[2]. Also, disk space available on systems has curiously enough remained mostly the same in 20 years. There was a brief period where storage *did* go up, but the introduction of flash storage brought storage back down on most computer systems. It's also wildly expensive to have a lot of storage on portable computers, which are now what most people have. And bandwidth costs vary based on what part of the world you live in. It's cheaper in Europe than it is in the Americas and especially in Asia and Africa. [2]: https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/ -- 真実はいつも一つ!/ Always, there's only one truth!
On 11/18/24 13:02, Neal Gompa wrote: [ .. ]>
Also, disk space available on systems has curiously enough remained mostly the same in 20 years. There was a brief period where storage *did* go up, but the introduction of flash storage brought storage back down on most computer systems. It's also wildly expensive to have a lot of storage on portable computers, which are now what most people have. And bandwidth costs vary based on what part of the world you live in. It's cheaper in Europe than it is in the Americas and especially in Asia and Africa.
Define 'a lot'. That perception has shifted over time. And disk space did go up: 20 years ago system drives were measured in Megabytes, 10 years ago in Gigabytes, and nowadays in Terabytes. (And I have an entire cupboard full with disks to prove that.) (IBM DNES-309170: 9 GB, manufactured 1999) (Quantum Atlas: 28.4 GB, manufactured 2000) (Hitachi Deskstar: 250 GB, manufactured 2006) (WD RE4: 500GB, manufactured 2010) (WD DC HC670: 25TB, manufactured 2023) (Serial numbers upon request) Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
On Mon, Nov 18, 2024 at 7:58 AM Hannes Reinecke <hare@suse.de> wrote:
On 11/18/24 13:02, Neal Gompa wrote: [ .. ]>
Also, disk space available on systems has curiously enough remained mostly the same in 20 years. There was a brief period where storage *did* go up, but the introduction of flash storage brought storage back down on most computer systems. It's also wildly expensive to have a lot of storage on portable computers, which are now what most people have. And bandwidth costs vary based on what part of the world you live in. It's cheaper in Europe than it is in the Americas and especially in Asia and Africa.
Define 'a lot'. That perception has shifted over time.
And disk space did go up: 20 years ago system drives were measured in Megabytes, 10 years ago in Gigabytes, and nowadays in Terabytes. (And I have an entire cupboard full with disks to prove that.) (IBM DNES-309170: 9 GB, manufactured 1999) (Quantum Atlas: 28.4 GB, manufactured 2000) (Hitachi Deskstar: 250 GB, manufactured 2006) (WD RE4: 500GB, manufactured 2010) (WD DC HC670: 25TB, manufactured 2023) (Serial numbers upon request)
Hard drives did, yes. But computer storage switched from hard drives to SSDs. My computer from 20 years ago has the same amount of storage as my computer now, with the only change being the switch to flash storage. Yes, in the intervening time I did have systems with more disk space, but now it's cost-prohibitive to have as much space as I did on my computer in 2015. :( -- 真実はいつも一つ!/ Always, there's only one truth!
Hello, On Mon, 18 Nov 2024, Neal Gompa wrote:
It's not just about making accurate profiling easier, it's also about making it cheap. Frame pointers make it so you can sample at any point on a running system.
Just as well as without FP. At least with backtracers that aren't the most lazy imaginable implementation.
This is why the Go compiler has had frame pointers on by default for 6 years.
No, it's because the initial developers of the Go ecosystem were too lazy to implement DWARF, or equivalent, unwind info.
It's also why other operating systems have frame pointers on for non-x86_32. Linux is the outlier.
Why are you saying this? Windows x64 also doesn't require one.
In any case - I propose shipping all packages with debug info included since that greatly improves the profiling experience - even more so than by enabling frame-pointers. Bandwidth and disk is cheap these days.
Actually, DWARF based profiling isn't very good, which is why so few people do it. It's slow, it's memory intensive, thus the sampling capability is much poorer. Richard W.M. Jones did a decent comparison about this and the problems with DWARF profiling over leveraging frame pointers[2].
Then improve the DWARF backtracers (e.g. for pure backtraces it's usually not required to fully unwind stuff, which includes restoring all registers). Making everyone pay to cater for bad tooling for corner usecases is a bad tradeoff.
[2]: https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwarf-my-verdict/
So, let me see, because the kernel unwinder folks have some mysterious aversion to DWARF so that they (a) implement their own equivalent unwinder info, just less capable and (b) trace through frame pointers in userspace on their own, but do not do that for DWARF, everything is "bad" when not using frame pointers. (Of course it's bad with that implementation, to store the full stack up to a size into perf.data? I didn't realize how completely bollocks their approach to non-FP backtraces was.) So, then, how is that not fault of the kernel/perf, and should be worked around by enabling a frame pointer for everyone? Why is a blog post pointing out crazyness in tool A a reason to change tool B to the detriment of everyone? Tool A needs changing! I.e. I object with a passion! Ciao, Michael.
On Mon, Nov 18, 2024 at 10:14 AM Michael Matz <matz@suse.de> wrote:
Hello,
On Mon, 18 Nov 2024, Neal Gompa wrote:
It's not just about making accurate profiling easier, it's also about making it cheap. Frame pointers make it so you can sample at any point on a running system.
Just as well as without FP. At least with backtracers that aren't the most lazy imaginable implementation.
This is why the Go compiler has had frame pointers on by default for 6 years.
No, it's because the initial developers of the Go ecosystem were too lazy to implement DWARF, or equivalent, unwind info.
I think you should be careful with the word "lazy". You're proscribing a mental context when you know very little of how they worked through their efforts.
It's also why other operating systems have frame pointers on for non-x86_32. Linux is the outlier.
Why are you saying this? Windows x64 also doesn't require one.
MSVC does not let you turn off frame pointers on x64. The flag to switch it off does nothing. https://learn.microsoft.com/en-us/cpp/build/reference/oy-frame-pointer-omiss... -- 真実はいつも一つ!/ Always, there's only one truth!
Hello, On Mon, 18 Nov 2024, Neal Gompa wrote:
It's also why other operating systems have frame pointers on for non-x86_32. Linux is the outlier.
Why are you saying this? Windows x64 also doesn't require one.
MSVC does not let you turn off frame pointers on x64. The flag to switch it off does nothing.
No, the flag to switch frame pointer on (/Oy-) does nothing.
https://learn.microsoft.com/en-us/cpp/build/reference/oy-frame-pointer-omiss...
On x64 it doesn't allow you to _enable_ a frame pointer when there aren't other reasons to require one (e.g. alloca). Frame pointers aren't even used without any optimization. See e.g. https://godbolt.org/z/1r8W17qo5 ('function2' requires FP, 'function' does not, and it's not using one, no matter the /O options. I've also tried random older versions and they all seem to work the same in that respect). Ciao, Michael.
Am 18.11.24 um 13:02 schrieb Neal Gompa:
Actually, DWARF based profiling isn't very good, which is why so few people do it. It's slow, it's memory intensive, thus the sampling capability is much poorer.
That's an oversimplification. It's certainly slower and generates larger profiles, but it's also much more powerful. You can see inlined functions and get an association to source code lines in the annotation. Sometimes knowing the function names is enough, but especially in a larger code base or if you're not so familiar with the code that you can match assembly with source lines manually, this can be enormously helpful. For my part, I prefer DWARF profiling when I have the debug info around.
In any case - I propose shipping all packages with debug info included since that greatly improves the profiling experience - even more so than by enabling frame-pointers. Bandwidth and disk is cheap these days. That's probably too much, given that debug info can be much larger than
Am 18.11.24 um 09:30 schrieb Richard Biener: the actual binary. But I suggested elsewhere in the thread that I'd like to keep .symtab, and maybe even some light debug info? In this case it seems we'd only need .debug_frame. (But I don't know how that compares to binary sizes and the remainder of debug info on average.) The broader argument would be that it's valuable to know where you are (and it also helps for crash stacks), while we don't pay for full debug info which is dominated by information about types, variables, and so on, which is only needed in rare cases. We could also add something like DW_TAG_inlined_subroutine to get more fine-grained call stacks. But this is already more than frame pointers would provide. There is also the option to compress debug info, but I don't know how well this is supported by the tools in question, especially "perf".
On Tue, Nov 19, 2024 at 7:37 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
In any case - I propose shipping all packages with debug info included since that greatly improves the profiling experience - even more so than by enabling frame-pointers. Bandwidth and disk is cheap these days. That's probably too much, given that debug info can be much larger than
Am 18.11.24 um 09:30 schrieb Richard Biener: the actual binary. But I suggested elsewhere in the thread that I'd like to keep .symtab, and maybe even some light debug info? In this case it seems we'd only need .debug_frame. (But I don't know how that compares to binary sizes and the remainder of debug info on average.)
The broader argument would be that it's valuable to know where you are (and it also helps for crash stacks), while we don't pay for full debug info which is dominated by information about types, variables, and so on, which is only needed in rare cases.
We could also add something like DW_TAG_inlined_subroutine to get more fine-grained call stacks. But this is already more than frame pointers would provide.
There is also the option to compress debug info, but I don't know how well this is supported by the tools in question, especially "perf".
Are we not using minidebuginfo and dwz still? Both of those have been in use in Fedora for over a decade[1][2]. These have been in place in the Red Hat world for a while, and still frame pointers were needed to provide a better experience. [1]: https://fedoraproject.org/wiki/Features/MiniDebugInfo [2]: https://fedoraproject.org/wiki/Features/DwarfCompressor -- 真実はいつも一つ!/ Always, there's only one truth!
Am 20.11.24 um 01:40 schrieb Neal Gompa:
On Tue, Nov 19, 2024 at 7:37 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
In any case - I propose shipping all packages with debug info included since that greatly improves the profiling experience - even more so than by enabling frame-pointers. Bandwidth and disk is cheap these days. That's probably too much, given that debug info can be much larger than
Am 18.11.24 um 09:30 schrieb Richard Biener: the actual binary. But I suggested elsewhere in the thread that I'd like to keep .symtab, and maybe even some light debug info? In this case it seems we'd only need .debug_frame. (But I don't know how that compares to binary sizes and the remainder of debug info on average.)
The broader argument would be that it's valuable to know where you are (and it also helps for crash stacks), while we don't pay for full debug info which is dominated by information about types, variables, and so on, which is only needed in rare cases.
We could also add something like DW_TAG_inlined_subroutine to get more fine-grained call stacks. But this is already more than frame pointers would provide.
There is also the option to compress debug info, but I don't know how well this is supported by the tools in question, especially "perf". Are we not using minidebuginfo and dwz still?
I don't think we ship with any debug info (not even symbols), but I'm happy to be proven wrong. We use dwz, but my understanding is that it doesn't do much for .debug_frame. The main benefit is that it deduplicates things like type information that is emitted into every TU. The frame information should only have duplicates for non-inlined inline functions, which are probably not so common with optimizations on. The compression that I'm talking about is general purpose compression such as gzip or zstd, which is a relatively recent feature.
These have been in place in the Red Hat world for a while, and still frame pointers were needed to provide a better experience.
To quote that Wiki page: “Debug info for backtraces relies on two types of information, the function names in the symbol tables, and (optionally) the linenumber debug information.” There is no mention of .debug_frame, only .symtab and .debug_line. (I agree about .symtab, but I'm not sure about .debug_line.) Aaron
On Nov 20 2024, Aaron Puchert wrote:
That's probably too much, given that debug info can be much larger than the actual binary. But I suggested elsewhere in the thread that I'd like to keep .symtab, and maybe even some light debug info? In this case it seems we'd only need .debug_frame. (But I don't know how that compares to binary sizes and the remainder of debug info on average.)
.debug_frame is the same as .eh_frame which is always available. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Am 20.11.24 um 09:10 schrieb Andreas Schwab:
On Nov 20 2024, Aaron Puchert wrote:
That's probably too much, given that debug info can be much larger than the actual binary. But I suggested elsewhere in the thread that I'd like to keep .symtab, and maybe even some light debug info? In this case it seems we'd only need .debug_frame. (But I don't know how that compares to binary sizes and the remainder of debug info on average.) .debug_frame is the same as .eh_frame which is always available.
I thought they were slightly different, but I was also wondering why I didn't find .debug_frame anywhere. Indeed, "perf record --call-graph dwarf" seems to work on packaged binaries. I was always wondering why stacks are cut off, but that is likely due to the fixed size being recorded. Increasing it gives me longer stacks. Great, so we don't actually need any additional debug info here? However, missing .symtab is still an issue. Several stack frames show up as hex addresses. In fact Clang, where I tried this, is likely a lucky case because it's compiled with default visibility for non-inline functions. So we still get most functions via .dynsym. Most binaries will have a lot less symbol names in a profile. Aaron
Am 16.11.24 um 19:24 schrieb Neal Gompa:
I would like to change openSUSE's default build flags in rpm and rpm-config-SUSE to incorporate frame pointers (and equivalents on other architectures) to support real-time profiling and observability on SUSE distributions.
Profiling doesn't necessarily need call stacks. Often you can learn a great deal from just seeing the hot functions. I'd say that I work with plain profiles ("perf record" without -g) most of the time and add call stacks only when I'm unsure why a function is so frequently called. Stacks can sometimes even obscure hotspots, depending on how you're reading them: if a hot function distributes its cost among many different call stacks, it will not stick out in a top-down flame graph. (You'll have to do a bottom-up.) There is another obstacle, one that applies even to profiling without call stacks: there are no symbols. We strip .symtab along with debug info. Without that, the stacks that you get are meaningless. However, if you meant to include that: I would like symbols to be kept (basically replace --strip by --strip-debug), and I think it would be less controversial: no runtime impact and only slightly larger binaries. With my compiler hat on, frame pointers often don't make sense. If there are no VLAs and allocas, the compiler knows how large a stack frame is and can simply add (for stacks that grow down) that fixed size in the epilogue. The frame pointer on the stack contains no information that the compiler doesn't already have. Furthermore, the frame pointer could be overwritten by stack buffer overflows. That's not a very strong argument, because an attacker would much rather overwrite the return address, and we have stack protectors, but "don't compute at runtime what you can compute at compile-time".
* The performance hit for having it vs not is insignificant[6].
This is a tricky argument. There are lots of things one could do that individually have little impact (maybe 1–10%), but those little things add up. One guy wants to add frame pointers for profiling, another wants stack protectors, the next guy wants automatic initialization of local variables (likely C++26) or mandatory boundary checks. They all claim that it adds just a little (on average). But what if the benefit is also small? If we take the average cost (which is typically small for most things you can add) we should also take the average benefit. That might not be much larger since lots of people, even Linux users, will never run "perf record". Even among developers, profiling might be restricted to self-built binaries. This is not my own situation: I do profiling of packaged binaries quite regularly. But we should get ourselves out of the way and think about the larger user base. The people that will profile packaged binaries are likely the packagers themselves, so in this bubble we're a bit biased.
I want openSUSE to be a great place for people to develop and optimize workloads on, especially desktop ones, where most of the tooling we have for tracing and profiling is broken without frame pointers (see Sysprof and Hotspot from GNOME and KDE respectively, which both rely on frame pointers to have cheap real-time tracing for performance analysis).
I don't know about any of those, but I'd assume they're just GUIs around "perf"? If you have an Intel CPU since Haswell (Zen 4 or so should also have LBR, but I haven't tried it yet), "perf record --call-graph lbr" works even without frame pointers, and in my experience pretty reliable. Just to make clear: I don't want this to be seen as argument against frame pointers, but I think the case isn't terribly clear. It's a feature that mostly benefits package maintainers and other people that want to tweak the distro, when they're trying to investigate performance issues, while the costs, however small they may be, are paid by everybody all the time. Aaron
On Wednesday 2024-11-20 01:13, Aaron Puchert wrote:
Am 16.11.24 um 19:24 schrieb Neal Gompa:
* The performance hit for having it vs not is insignificant[6].
There are lots of things one could do that individually have little impact (maybe 1–10%), but those little things add up. One guy wants to add frame pointers for profiling, another wants stack protectors, the next guy wants automatic initialization of local variables (likely C++26) or mandatory boundary checks. They all claim that it adds just a little (on average).
…and we just recently gone from FORTIFY_SOURCE=2 to FORTIFY_SOURCE=3, so allocations for 2024 are now used up ;-)
participants (7)
-
Aaron Puchert
-
Andreas Schwab
-
Hannes Reinecke
-
Jan Engelhardt
-
Michael Matz
-
Neal Gompa
-
Richard Biener