RFC: switch default flavor to HZ=300
Hi, I recently stumbled over the 300 HZ option in the kernel configuration. Currently we are using 250 HZ with IDLE_HZ. The HZ setting is used as far as I can see to determine when jiffies are advancing, so it influences the granularity of a couple of things like kvmclock, timers and so on. Also USER_HZ is 100 HZ which is the granularity to which some metrics are exported to user space. 300 is divisible without remainder by 100, unlike 250. so it appears there are good reasons for switching the default, and it is "just" 20% more timer interrupts than before, so it should not be a huge issue. I also did a test build and a 300 HZ kernel is by a few bytes smaller than a 250 HZ kernel, indicating that the compiler can optimize away a few things. I am seeing an increase in a few small functions, and I'm looking into making the code size increase go away with a source level tweak. I've benchmarked both versions in a micro benchmark that does a billion invocations of both, and while the code is larger, it runs in exactly the same runtime (+/- 3% which I consider my benchmark noise level) on a Ryzen Zen 2+. In the Kconfig description of 300 HZ option, it appears this is more recommended for multimedia usecases because it is divisible without remainder for common rates, like 30 (fps), 60 fps , 120 fps, 44.1khZ and others that are often needed. This is imho not only usable for desktop, but also for servers that are using multimedia related applications. I've seen that fedora-like distributions use 1000 HZ, arch linux uses 300 Hz and debian defaults to 250 HZ. So there is no clear trend. I'd be fine with 300 or 1000 HZ. Comments? Would like to hear your feedback before sending a change to the Tumbleweed configs. Thanks, Dirk
Hi, On 23. 02. 22, 17:09, Dirk Müller wrote:
I recently stumbled over the 300 HZ option in the kernel configuration. Currently we are using 250 HZ with IDLE_HZ. The HZ setting is used as far as I can see to determine when jiffies are advancing, so it influences the granularity of a couple of things like kvmclock, timers and so on. Also USER_HZ is 100 HZ which is the granularity to which some metrics are exported to user space. 300 is divisible without remainder by 100, unlike 250.
so it appears there are good reasons for switching the default, and it is "just" 20% more timer interrupts than before, so it should not be a huge issue. I also did a test build and a 300 HZ kernel is by a few bytes smaller than a 250 HZ kernel, indicating that the compiler can optimize away a few things.
I am seeing an increase in a few small functions, and I'm looking into making the code size increase go away with a source level tweak. I've benchmarked both versions in a micro benchmark that does a billion invocations of both, and while the code is larger, it runs in exactly the same runtime (+/- 3% which I consider my benchmark noise level) on a Ryzen Zen 2+.
In the Kconfig description of 300 HZ option, it appears this is more recommended for multimedia usecases because it is divisible without remainder for common rates, like 30 (fps), 60 fps , 120 fps, 44.1khZ and others that are often needed.
Unlike today still common 25 fps (PAL) and 48kHz.
This is imho not only usable for desktop, but also for servers that are using multimedia related applications.
I've seen that fedora-like distributions use 1000 HZ, arch linux uses 300 Hz and debian defaults to 250 HZ. So there is no clear trend. I'd be fine with 300 or 1000 HZ.
Even 1 kHz might be fine on some archs with NO_HZ_IDLE. But does it bring anything while having hrtimers? Historically, you wanted the lowest good value for power savings. And the highest possible for preemption (best user response feeling). We even had 1kHz until: https://github.com/openSUSE/kernel-source/commit/50a275a4bca006566590e94f880... The reasons to the change are not noted (again!), so we can only guess... I actually don't care about what value is selected. But you should get in touch with timer guys and the performance team too. I doubt they read this list :). But they should definitely comment (provided we unify kernels with SLE). So maybe creating a bug? regards, -- js suse labs
On Thu, 24 Feb 2022 10:30:07 +0100, Jiri Slaby wrote:
Hi,
On 23. 02. 22, 17:09, Dirk Müller wrote:
I recently stumbled over the 300 HZ option in the kernel configuration. Currently we are using 250 HZ with IDLE_HZ. The HZ setting is used as far as I can see to determine when jiffies are advancing, so it influences the granularity of a couple of things like kvmclock, timers and so on. Also USER_HZ is 100 HZ which is the granularity to which some metrics are exported to user space. 300 is divisible without remainder by 100, unlike 250.
so it appears there are good reasons for switching the default, and it is "just" 20% more timer interrupts than before, so it should not be a huge issue. I also did a test build and a 300 HZ kernel is by a few bytes smaller than a 250 HZ kernel, indicating that the compiler can optimize away a few things.
I am seeing an increase in a few small functions, and I'm looking into making the code size increase go away with a source level tweak. I've benchmarked both versions in a micro benchmark that does a billion invocations of both, and while the code is larger, it runs in exactly the same runtime (+/- 3% which I consider my benchmark noise level) on a Ryzen Zen 2+.
In the Kconfig description of 300 HZ option, it appears this is more recommended for multimedia usecases because it is divisible without remainder for common rates, like 30 (fps), 60 fps , 120 fps, 44.1khZ and others that are often needed.
Unlike today still common 25 fps (PAL) and 48kHz.
Heh, we need a kernel flavor per location? :)
This is imho not only usable for desktop, but also for servers that are using multimedia related applications.
I've seen that fedora-like distributions use 1000 HZ, arch linux uses 300 Hz and debian defaults to 250 HZ. So there is no clear trend. I'd be fine with 300 or 1000 HZ.
Even 1 kHz might be fine on some archs with NO_HZ_IDLE. But does it bring anything while having hrtimers?
That's my question, too. And, we have enable dynamic preempt, so the kernel can run in full preemption if user wants better latency...
Historically, you wanted the lowest good value for power savings. And the highest possible for preemption (best user response feeling).
We even had 1kHz until: https://github.com/openSUSE/kernel-source/commit/50a275a4bca006566590e94f880...
The reasons to the change are not noted (again!), so we can only guess...
I actually don't care about what value is selected. But you should get in touch with timer guys and the performance team too. I doubt they read this list :). But they should definitely comment (provided we unify kernels with SLE). So maybe creating a bug?
Agreed, this is nothing but a performance tuning, and our experts can give better comments and evaluations. thanks, Takashi
Hi Jiri, Am Do., 24. Feb. 2022 um 10:30 Uhr schrieb Jiri Slaby <jslaby@suse.cz>:
Historically, you wanted the lowest good value for power savings.
I think that was before HZ_IDLE functionality, but I might be wrong.
And the highest possible for preemption (best user response feeling).
it's not only that, it is also affecting resolution / granularity of a lot of things.
https://github.com/openSUSE/kernel-source/commit/50a275a4bca006566590e94f880...
The reasons to the change are not noted (again!), so we can only guess...
well it says "switch to (upstream) default". which is fair, if you don't have good reasons for deviating, the upstream default is a safe bet as it is the most widely used one hopefully. but that was before upstream introduced teh 300 HZ option as a potential successor. there was never an upstream discussion to change the upstream default that I could find in the mailing list archives however.
I actually don't care about what value is selected. But you should get in touch with timer guys and the performance team too. I doubt they read this list :). But they should definitely comment (provided we unify kernels with SLE). So maybe creating a bug?
Who are the "timer guys" and why are they not subscribed to the kernel list? ;) I can create a bugreport if thats the preferred way to come to a decision. Thanks, Dirk
On Thu 24-02-22 12:30:47, Dirk Müller wrote:
Hi Jiri,
Am Do., 24. Feb. 2022 um 10:30 Uhr schrieb Jiri Slaby <jslaby@suse.cz>:
Historically, you wanted the lowest good value for power savings.
I think that was before HZ_IDLE functionality, but I might be wrong.
And the highest possible for preemption (best user response feeling).
it's not only that, it is also affecting resolution / granularity of a lot of things.
Yes, but that goes both ways. E.g. some kernel tunables are in milliseconds and translated to jiffies internally. So if I previously was setting 4/8 ms in those settings, it was 1 or 2 jiffies. Now it will be again 1 or 2 jiffies but resulting in delays of 3.3 and 6.6 ms instead. So the change will have subtle impact on various things in user setups.
I actually don't care about what value is selected. But you should get in touch with timer guys and the performance team too. I doubt they read this list :). But they should definitely comment (provided we unify kernels with SLE). So maybe creating a bug?
Who are the "timer guys" and why are they not subscribed to the kernel list? ;)
I can create a bugreport if thats the preferred way to come to a decision.
Yeah, let's track this in bugzilla both for the sake of documentation of the decision as well as having some place to track various impacts we find the change has. Please add kernel-performance-bugs@suse.de to CC for performance team awareness... Thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
Hi, On 24. 02. 22, 12:30, Dirk Müller wrote:
I actually don't care about what value is selected. But you should get in touch with timer guys and the performance team too. I doubt they read this list :). But they should definitely comment (provided we unify kernels with SLE). So maybe creating a bug?
Who are the "timer guys" and why are they not subscribed to the kernel list? ;)
I don't want to speak for them, but many SUSE developers are not used to subscirbe to @opensuse lists (you'd need to use the kernel@ internal one to reach them). I don't know if anyone else do timers/HZ things, but Frederic Weisbecker would be a perfect fit for this topic, IMO.
I can create a bugreport if thats the preferred way to come to a decision.
As others write, definitely. It would be documented, persistent and reference-able in commit logs. thanks, -- js suse labs
On 24.02.22 10:30, Jiri Slaby wrote:
Hi,
On 23. 02. 22, 17:09, Dirk Müller wrote:
In the Kconfig description of 300 HZ option, it appears this is more recommended for multimedia usecases because it is divisible without remainder for common rates, like 30 (fps), 60 fps , 120 fps, 44.1khZ and others that are often needed.
Unlike today still common 25 fps (PAL) and 48kHz.
These are also divisible without remainder at 300HZ ;-) -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
On 24. 02. 22, 13:40, Stefan Seyfried wrote:
On 24.02.22 10:30, Jiri Slaby wrote:
On 23. 02. 22, 17:09, Dirk Müller wrote:
In the Kconfig description of 300 HZ option, it appears this is more recommended for multimedia usecases because it is divisible without remainder for common rates, like 30 (fps), 60 fps , 120 fps, 44.1khZ and others that are often needed.
Unlike today still common 25 fps (PAL) and 48kHz.
These are also divisible without remainder at 300HZ ;-)
Heh, I cannot do first grade math anymore apparently :). -- js suse labs
participants (5)
-
Dirk Müller
-
Jan Kara
-
Jiri Slaby
-
Stefan Seyfried
-
Takashi Iwai