On Thu, Nov 28, 2024 at 4:48 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
Am 28.11.24 um 06:53 schrieb Andrii Nakryiko:
My whole point was that *benchmark* results are just part of the story, and we can't just take 1% and use that as "this is how much slower everything will be". And yet, you go ahead and do exactly that, doing some hypothetical math about collective slowdowns, with average usage costs, etc. I'm sorry, I see this as a completely useless hypothetical exercise. It's not hypothetical at all. Frame pointers affect every function in every binary. There is no reason why it should affect something like GCC more or less than maybe harfbuzz or some GTK library. I'm not even taking outliers like that Python benchmark into account. There is going to be some deviation, but 1-2%overall definitely sounds plausible. (Given the average function length and the additional instructions needed.) On the other hand, you completely discount and doubt any improvements enabled by more readily available profiling and observability tooling, even though it was *already* shown both in server-side land (e.g., in Meta fleet) and in Fedora distro ecosystem.
I don't discount these improvements, but (focusing on the distro ecosystem):
* profiling and performance optimizations for a large part happen regardless of whether we build with frame pointers and
This is a subjective statement not backed by any data. I have a completely opposite experience (and expectation as well).
* a 1-2% improvement across everything would be massive. The only way I see that happen is by compiler improvements, and I'm not sure they're going to happen this way.
You are overpivoting on this 1-2%. This is based on a specific *benchmark*, not some end-to-end real life performance. We can't just blindly take those numbers and generalize them. Not everything will be slowed down, lots of code is a) not performance critical, b) might not even benefit from that extra register, c) is not called frequently enough for those few instruction to set up %rbp register to matter or be measurable (and think about it, if those 2-3 instructions matter, then perhaps your function doesn't even do all that much useful work and should/would be inlined by the compiler). And about your general point that 1-2% across everything only could be coming from the compiler. This is an unnecessary and unrealistic expectation. Most of the code isn't performance critical. On the other hand, that small portion of code that is performance critical, could get way more than 1-2% speed up if someone notices inefficiency there. And it's much easier to spot inefficiency when you have easily attainable (without going through tons of trouble) system-wide profiling. Anyways, I think we are in the territory of subjective opinions, and there just isn't objective hard data to prove anything conclusively and without any doubt. There has to be some value judgement by whoever is going to make this decision about frame pointers. The good thing is that Fedora showed that this change is a) useful and b) not really detrimental to overall user experience performance-wise. Also, in this part of discussion we are still focusing purely on *profiling* use cases, which is just one aspect that would benefit from frame pointers. All the observability tooling, ad-hoc debugging, etc, based on bpftrace or pure BPF would tremendously benefit both users and developers, if they just work out of the box. This can't be discounted. Actually, I'd suggest anyone interested to go and explore uprobes tracing and USDTs. They feel like magic, and if combined with stack traces are extremely powerful.
The second point is much easier in the Meta fleet. I don't doubt at all that you've gained even more than that.