On Wed, Nov 27, 2024 at 9:48 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
Am 27.11.24 um 10:25 schrieb alessio.biancalana@suse.com:
Il giorno mer, 27/11/2024 alle 04.54 +0100, Aaron Puchert ha scritto:
But at least right now I don't think they have dedicated people for performance, and I'm not aware of anyone regularly doing this kind of work in the community.
If they exist, please step up.
Howdy! The very first reply to this thread was mine, and it was pretty idiotic from my side not to introduce myself. I'm Alessio, I hack things on openSUSE when I have some free time, I maintain some packages and I try to champion all-things-observability both in the community _and_ in my day by day at SUSE.
Thanks for stepping up!
As Neal pointed out, right now eBPF profiling isn't really a thing because eBPF unwinding itself isn't really a thing.
Coming to GNOME, as Neal was reporting, Christian Hergert wrote a couple posts that were really eye-opening about this kind of system- wide profiling even being possible.
https://fedoramagazine.org/performance-profiling-in-fedora-linux/
Got it, so the idea is to profile something like a GUI application with lots of dependencies. Most of the work doesn't happen in the application itself, but in all kinds of libraries (fonts, rendering, IO, etc.) that all need FPs. That's different from my profiling, where the application itself sits on the CPU all the time.
Right, your kind of profiling is the simple kind. And surprisingly uncommon. It's rare for application stacks to be "close to metal" and lack a deep middleware stack. Even server workloads typically have a few layers between the developer code and the metal these days.
Do you think that SUSE will systematically use this in the long term? I know data center operators do this (Meta, Google, Netflix), because they run their software at a large scale, and want to reduce their server footprint. They have dedicated performance engineers. SUSE isn't quite in the same situation, because most workloads on the software that we build happens out of our sight. (You could profile OBS or openQA, but I'm not sure how useful that would be.)
Can you please stop focusing on the wrong thing here? One of the key values of this is that you don't *need* dedicated performance engineers to do performance analysis. It becomes accessible to everyone. I think it will naturally get used as it becomes available. Within three months of it existing in Fedora, GNOME folks who develop GNOME on Fedora had made tremendous strides in GLib, GTK, and VTE. It has continued to pay off substantially simply because the capability is *there*. There's a lot of potential with openSUSE having it too, particularly with communities that use openSUSE as the principal development and qualification platform (such as KDE).
So would we systematically look at usage scenarios? Would you expect that package maintainers do profiling for their packages?
https://blogs.gnome.org/chergert/2022/12/31/frame-pointers-and-other-practic...
Let's put the near term aside for now. What happens in the long term? When all distros have FP enabled, are we still going to see effort poured into out-of-band unwinding? Or is it just going to be written off?
As I've written in reply to Andrii, in the compiler world a 2% regression is quite significant. So I'd like to see a way out.
The truth is that it was already written off. Nobody does it for serious system-wide tracing. There is no way out until someone comes up with an actually useful SFrames proposal that includes all Linux architectures (which does not exist yet and may not exist for another ten years). As a hobbyist Linux desktop systems developer, I'm tired of being hampered by our own stack because people think we don't deserve working tools. :( -- 真実はいつも一つ!/ Always, there's only one truth!