Am Mittwoch, 16. Juni 2021, 11:16:50 CEST schrieb Thomas Hartwig:
Hi Pete,
thank you very much for the information and your work, I tested the preempt kernel 5.12 and it is performing very well.
Glad to hear that. I tend to delay the kernel version bumps a bit, but please use keeppackages=1 on that repo. "eg. zypper mr -k repo"
I think I can use these in production environments as well. This is good.
Rest assured that when I know someone is using these builds in production, I take extra care with them. My usual workflow is fetching from the opensuse kernel, rebasing, and pushing to this project. The packages in my local OBS are linked from there.
It would be nice to have some for distribution releases 15.2/15.3 but as long as the repository is isolated for the kernels only I can use the Tumbleweed version for sure.
The Leap builds have different issues, that are manageable, but if the TW build do for you, please go for it.
Meanwhile I want to give some more technical background to the issues monitored. Please note I am not a kernel developer nor an expert. Instead I am an application developer with extended system knowledge and sometimes vertical hardware diving but mainly I am locked to application development, this is why I am limited in time and know-how what is going on specific kernel versions. The systems I work with are specialized video capturing systems working with industrial high speed cameras based on GigE (Basler/Pylon software driver). All this is TCP/IP stack based, so IMHO this is the most critical section. The systems are storing video frames in memory and these are further processed in complex multi-threaded applications. Simply to give some numbers here which are really impressive what a Linux system is capable of: 6 cameras are streaming at 4000 frames per second each with a bandwidth of 100 MB/s which is handled by a fiber optic network card X710. So roughly 600 MB/s are handled. The CPU is an Intel Xeon Silver 4216 16 core. In the end the amount of cameras does matter to our problems witnessed it can happen with simply 1 camera. The problems are not related to our application since it can be reproduced with vendor original test software (even without storing anything). So it is rather a latency than a bandwidth problem. From time to time there are single frames lost when the system is not configured for high speed aka CONFIG_HZ. Unfortunately we can not track it easily and have not possibility to influence the driver itself. But from our observation it is like the Linux system takes some brakes from time to time to manage interrupt handlers for the network stack. Then the camera buffers can not be processed by the driver in time and are lost. I can not explain why 1000 MHz instead of 250 MHz is making this difference. We have tried to optimize almost all others things thinkable like TCP buffers/network card driver and so on, maybe IOWAIT is an issue to be considered further but we really do not storing at all when we test...
Wow, that sounds like real fun. And bit dance. This asks for playing with qdiscs and NIC offloading of course. Probably not applicable, but wasn't SCTP invented for this area.
As I said I am limited in time to setup a complete test environment and going ground deep. All I can say the preempt kernel at 1000 MHz is working well.
This is valuable feedback of course, even if not everyone likes it. While preemption is on the way of getting a dynamic boot option, HZ isn't as far as I know. Cannot imagine giving up on const'ness of HZ. This is a strong vote of a separate preempt flavor. However, you should test the official default (TW) kernel with the preempt option, once it appears, as well, if time permits.
I hope this gives some insight and can be used as feedback from the field to all have worked in the Linux systems, thanks for this and thanks Pete for making this kernel.
My pleasure and good luck, Thomas! Best regards, Pete
BR Thomas