On 22/03/2020 18.42, Dr. Guido Dhondt wrote:
I am not sure what you mean by runtime requirements.
The only external packages required are ARPACK and SPOOLES. In fact, these packages are only used for structural mechanics calculations. For CFD (the test examples I talk about are CFD examples) I used dgmres from the SLATEC library. This routine is included in CalculiX.
I also made sure that no data files (input/output) are common to the examples.
The problems (nan) do not occur from the beginning. They occur after maybe 100 up to 200 iterations (at different numbers for each example).
That could indicate that the math is done using different size for floats. Maybe there is an operation somewhere that uses an automatic conversion (casting) to a lesser size (C does this without telling) and precision is lost. I think you said that the problem started when using CUDA. I know little about this, but I suppose that the GPU is one, not 12. I have no idea how it switches from one job to another, but it can not parellize. Maybe it is better to let it finish the current job, then switch to another. Maybe there is a problem here. All threads competing to use the single GPU, maybe it doesn't switch right. I'm guessing, I have never used a GPU. This moment I have a Ryzen processor that claims 12 cores, but I think it is really 6, doubled with whatitsname, hyper-threading? If your test is easy to setup and run, I could try. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)