Am 26.07.22 um 22:31 schrieb Patrick Shanahan:
fwiw: I have utilized nvidia for many years and it is not uncommon for any particular nvidia card to fail for a particular kernel/driver or for the driver to be issued somewhat late. that said, it is infrequent and is simple to boot the previous kernel and not be concerned with anything else. soon another driver and/or kernel will be issued and solve the problem.
In my case, that approach didn't work. This is frustrating since I know a bit about software: https://stackoverflow.com/users/34088/aaron-digulla Also, I feel like there are a dozen different solutions for this problem. All of them work but not always. I, for example, have never tried to boot an old kernel because someone said you should never do this because the user space tools on disk might break when the kernel suddenly changes which could lead to all kinds of strange problems.
the important thing is to accurately report the conditions and equipment so the maintainers may determine the correction needed.
emphasis on "accurately report the conditions and equipment"
it is virtually impossible to test against all possible conditions!
I don't agree at all. I think we are in this mess because: 1. The kernel doesn't have a stable binary driver API like, say, the AmigaOS already had in 1984. 2. The kernel comes with a ton of config options. This makes is very versatile but also brittle because you if you're not very careful, you have a combinatory explosion of test cases. 3. Software should have a unit test code coverage between 40-60%. What's the coverage for kernel 5.14? And don't tell me you can't test hardware drivers. I did it in AROS and in commercial software as well. Yes, testing the peeking and poking in hardware registers is expensive to test but still not impossible. A coverage of 40% is still easy to achieve because drivers contain a lot of other code (config processing, error handling, management, ...). 4. There is no centralized automated CI system where everyone working on the kernel or a driver can push patches and drivers to see if they even compile. 5. Every Linux company spends countless hours on maintaining their own kernel patches. So even if your code works with the vanilla kernel, there is a good chance that it won't work with anything else. 6. Many developers don't care to write useful error messages: Ones that contain all the information necessary to fix the issue. All of these things are this way because someone wants them to be that way. Linus opposes the idea of a stable external and driver API in the kernel (while ranting that the stupid desktop guys change their API all the time...). There are good reasons for his stance but one of the drawbacks is that drivers break every now and then. And then, even someone with 300'000+ points on stackoverflow has a hard time to fix it. And I have to waste your time asking questions. For these reasons: A mess like this doesn't happen by pure chance. It takes meticulous planning and careful execution by many smart people over many years. ;-) Anyway ... thanks for your time. Good night, -- Aaron "Optimizer" Digulla a.k.a. Philmann Dark "It's not the universe that's limited, it's our imagination. Follow me and I'll show you something beyond the limits." http://blog.pdark.de/