[opensuse] Concurrently running jobs interfere (leap 15.1)
Dear Linux-Users, in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0. One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script: for i in *.inp; do /home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 & done This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours). Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine. Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age). I have no idea why this happens. Any idea? Best Greetings, Guido -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne neděle 22. března 2020 14:57:40 CET, Dr. Guido Dhondt napsal(a):
in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0. One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script: for i in *.inp; do /home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 & done This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours). Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine. Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age). I have no idea why this happens. Any idea?
Wild guessing, but did You recompile the software? 15.0 and 15.1 would contain different versions of libraries, which could have impact... As Your app seems using only single core, why not use it with GNU Parallel, which is able to distribute individual jobs into multiple cores? Something like find . -name "*.inp" | parallel "CalculiX '{}' >> '{.}'.lst 2>&1" -- Vojtěch Zeisek https://trapa.cz/ Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
I recompile the software constantly (make -j12 works extremely well, no problem and extremely fast, about 12 s to compile hundreds of routines). I will try the parallel command. Thanks, Guido On Sunday, March 22, 2020 3:07:09 PM CET Vojtěch Zeisek wrote:
Dne neděle 22. března 2020 14:57:40 CET, Dr. Guido Dhondt napsal(a):
in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0. One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script: for i in *.inp; do
/home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 &
done This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours). Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine. Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age). I have no idea why this happens. Any idea?
Wild guessing, but did You recompile the software? 15.0 and 15.1 would contain different versions of libraries, which could have impact... As Your app seems using only single core, why not use it with GNU Parallel, which is able to distribute individual jobs into multiple cores? Something like find . -name "*.inp" | parallel "CalculiX '{}' >> '{.}'.lst 2>&1"
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Using "parallel" does not bring anything, unfortunately, I still get "nan" after a couple of hundred iterations. I run the software in a serial way, however, the software uses pthread to create threads. If $OMP_NUM_THREADS=n it uses n threads. To run CalculiX in a serial way I set OMP_NUM_THREADS=1. So, each time a system of equations has to be solved 1 thread is created. This happens thousands of times, since most examples require 5000 or more iterations, in each iteration the equation system is solved up to 5 times (pressure, velocity...).I do not know whether the thread creation could be a problem? Guido On Sunday, March 22, 2020 3:07:09 PM CET Vojtěch Zeisek wrote:
Dne neděle 22. března 2020 14:57:40 CET, Dr. Guido Dhondt napsal(a):
in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0. One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script: for i in *.inp; do
/home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 &
done This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours). Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine. Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age). I have no idea why this happens. Any idea?
Wild guessing, but did You recompile the software? 15.0 and 15.1 would contain different versions of libraries, which could have impact... As Your app seems using only single core, why not use it with GNU Parallel, which is able to distribute individual jobs into multiple cores? Something like find . -name "*.inp" | parallel "CalculiX '{}' >> '{.}'.lst 2>&1"
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne neděle 22. března 2020 17:31:40 CET, Dr. Guido Dhondt napsal(a):
Using "parallel" does not bring anything, unfortunately, I still get "nan" after a couple of hundred iterations. I run the software in a serial way, however, the software uses pthread to create threads. If $OMP_NUM_THREADS=n it uses n threads. To run CalculiX in a serial way I set OMP_NUM_THREADS=1. So, each time a system of equations has to be solved 1 thread is created. This happens thousands of times, since most examples require 5000 or more iterations, in each iteration the equation system is solved up to 5 times (pressure, velocity...).I do not know whether the thread creation could be a problem?
This sounds like problem with CalculiX itself. I have no idea what this is supposed to do. I'd rather discuss the issue with its author.
On Sunday, March 22, 2020 3:07:09 PM CET Vojtěch Zeisek wrote:
Dne neděle 22. března 2020 14:57:40 CET, Dr. Guido Dhondt napsal(a):
in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0. One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script: for i in *.inp; do /home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 & done This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours). Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine. Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age). I have no idea why this happens. Any idea?
Wild guessing, but did You recompile the software? 15.0 and 15.1 would contain different versions of libraries, which could have impact... As Your app seems using only single core, why not use it with GNU Parallel, which is able to distribute individual jobs into multiple cores? Something like find . -name "*.inp" | parallel "CalculiX '{}' >> '{.}'.lst 2>&1" -- Vojtěch Zeisek https://trapa.cz/
Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
I am the author.... Guido On Sunday, March 22, 2020 5:43:36 PM CET Vojtěch Zeisek wrote:
Dne neděle 22. března 2020 17:31:40 CET, Dr. Guido Dhondt napsal(a):
Using "parallel" does not bring anything, unfortunately, I still get "nan" after a couple of hundred iterations. I run the software in a serial way, however, the software uses pthread to create threads. If $OMP_NUM_THREADS=n it uses n threads. To run CalculiX in a serial way I set OMP_NUM_THREADS=1. So, each time a system of equations has to be solved 1 thread is created. This happens thousands of times, since most examples require 5000 or more iterations, in each iteration the equation system is solved up to 5 times (pressure, velocity...).I do not know whether the thread creation could be a problem?
This sounds like problem with CalculiX itself. I have no idea what this is supposed to do. I'd rather discuss the issue with its author.
On Sunday, March 22, 2020 3:07:09 PM CET Vojtěch Zeisek wrote:
Dne neděle 22. března 2020 14:57:40 CET, Dr. Guido Dhondt napsal(a):
in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0. One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script: for i in *.inp; do
/home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 &
done This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours). Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine. Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age). I have no idea why this happens. Any idea?
Wild guessing, but did You recompile the software? 15.0 and 15.1 would contain different versions of libraries, which could have impact... As Your app seems using only single core, why not use it with GNU Parallel, which is able to distribute individual jobs into multiple cores? Something like find . -name "*.inp" | parallel "CalculiX '{}' >> '{.}'.lst 2>&1"
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
W dniu 22.03.2020 o 17:31, Dr. Guido Dhondt pisze:
Using "parallel" does not bring anything, unfortunately, I still get "nan" after a couple of hundred iterations.
I run the software in a serial way, however, the software uses pthread to create threads. If $OMP_NUM_THREADS=n it uses n threads. To run CalculiX in a serial way I set OMP_NUM_THREADS=1. So, each time a system of equations has to be solved 1 thread is created. This happens thousands of times, since most examples require 5000 or more iterations, in each iteration the equation system is solved up to 5 times (pressure, velocity...).I do not know whether the thread creation could be a problem?
Guido
That's some important info! OMP_NUM_THREADS is used by OpenMP library, that creates threads for parallel computation inside application. In this case I think it makes more sense to run the program sequentially, allowing it to use all threads. You gain nothing by running them in paralell, where each one is using only one thread. But, regardless of how they are run, it shouldn't affect the results of computations.
My parallelization only works well, if the examples are really big, e.g. 5 million equations. The test examples are smaller, maybe only 100000 equations. Therefore parallelization brings on 12 cpus max 30 % speedup due to the overhead. Running the examples sequentially one next to the other brings much more speedup, I think a factor of 5 - 10 from what I remember from my runs on leap 15.0. By the way, running these examples sequentially on another system using Suse Professional (for companies) also works without problems. Only my present system with leap 15.1 produced the problem so far. Greetings, Guido On Sunday, March 22, 2020 5:55:02 PM CET Adam Mizerski wrote:
W dniu 22.03.2020 o 17:31, Dr. Guido Dhondt pisze:
Using "parallel" does not bring anything, unfortunately, I still get "nan" after a couple of hundred iterations.
I run the software in a serial way, however, the software uses pthread to create threads. If $OMP_NUM_THREADS=n it uses n threads. To run CalculiX in a serial way I set OMP_NUM_THREADS=1. So, each time a system of equations has to be solved 1 thread is created. This happens thousands of times, since most examples require 5000 or more iterations, in each iteration the equation system is solved up to 5 times (pressure, velocity...).I do not know whether the thread creation could be a problem?
Guido
That's some important info! OMP_NUM_THREADS is used by OpenMP library, that creates threads for parallel computation inside application.
In this case I think it makes more sense to run the program sequentially, allowing it to use all threads. You gain nothing by running them in paralell, where each one is using only one thread.
But, regardless of how they are run, it shouldn't affect the results of computations.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne neděle 22. března 2020 18:10:47 CET, Dr. Guido Dhondt napsal(a):
My parallelization only works well, if the examples are really big, e.g. 5 million equations. The test examples are smaller, maybe only 100000 equations. Therefore parallelization brings on 12 cpus max 30 % speedup due to the overhead. Running the examples sequentially one next to the other brings much more speedup, I think a factor of 5 - 10 from what I remember from my runs on leap 15.0. By the way, running these examples sequentially on another system using Suse Professional (for companies) also works without problems. Only my present system with leap 15.1 produced the problem so far.
Does the software have some runtime requirements (dependencies), which might be missing on Your current installation...?
On Sunday, March 22, 2020 5:55:02 PM CET Adam Mizerski wrote:
W dniu 22.03.2020 o 17:31, Dr. Guido Dhondt pisze:
Using "parallel" does not bring anything, unfortunately, I still get "nan" after a couple of hundred iterations. I run the software in a serial way, however, the software uses pthread to create threads. If $OMP_NUM_THREADS=n it uses n threads. To run CalculiX in a serial way I set OMP_NUM_THREADS=1. So, each time a system of equations has to be solved 1 thread is created. This happens thousands of times, since most examples require 5000 or more iterations, in each iteration the equation system is solved up to 5 times (pressure, velocity...).I do not know whether the thread creation could be a problem?
That's some important info! OMP_NUM_THREADS is used by OpenMP library, that creates threads for parallel computation inside application. In this case I think it makes more sense to run the program sequentially, allowing it to use all threads. You gain nothing by running them in paralell, where each one is using only one thread. But, regardless of how they are run, it shouldn't affect the results of computations. -- Vojtěch Zeisek https://trapa.cz/
Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux https://www.opensuse.org/
I am not sure what you mean by runtime requirements. The only external packages required are ARPACK and SPOOLES. In fact, these packages are only used for structural mechanics calculations. For CFD (the test examples I talk about are CFD examples) I used dgmres from the SLATEC library. This routine is included in CalculiX. I also made sure that no data files (input/output) are common to the examples. The problems (nan) do not occur from the beginning. They occur after maybe 100 up to 200 iterations (at different numbers for each example). Greetings, Guido On Sunday, March 22, 2020 6:24:16 PM CET Vojtěch Zeisek wrote:
Dne neděle 22. března 2020 18:10:47 CET, Dr. Guido Dhondt napsal(a):
My parallelization only works well, if the examples are really big, e.g. 5 million equations. The test examples are smaller, maybe only 100000 equations. Therefore parallelization brings on 12 cpus max 30 % speedup due to the overhead. Running the examples sequentially one next to the other brings much more speedup, I think a factor of 5 - 10 from what I remember from my runs on leap 15.0. By the way, running these examples sequentially on another system using Suse Professional (for companies) also works without problems. Only my present system with leap 15.1 produced the problem so far.
Does the software have some runtime requirements (dependencies), which might be missing on Your current installation...?
On Sunday, March 22, 2020 5:55:02 PM CET Adam Mizerski wrote:
W dniu 22.03.2020 o 17:31, Dr. Guido Dhondt pisze:
Using "parallel" does not bring anything, unfortunately, I still get "nan" after a couple of hundred iterations. I run the software in a serial way, however, the software uses pthread to create threads. If $OMP_NUM_THREADS=n it uses n threads. To run CalculiX in a serial way I set OMP_NUM_THREADS=1. So, each time a system of equations has to be solved 1 thread is created. This happens thousands of times, since most examples require 5000 or more iterations, in each iteration the equation system is solved up to 5 times (pressure, velocity...).I do not know whether the thread creation could be a problem?
That's some important info! OMP_NUM_THREADS is used by OpenMP library, that creates threads for parallel computation inside application. In this case I think it makes more sense to run the program sequentially, allowing it to use all threads. You gain nothing by running them in paralell, where each one is using only one thread. But, regardless of how they are run, it shouldn't affect the results of computations.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 22/03/2020 18.42, Dr. Guido Dhondt wrote:
I am not sure what you mean by runtime requirements.
The only external packages required are ARPACK and SPOOLES. In fact, these packages are only used for structural mechanics calculations. For CFD (the test examples I talk about are CFD examples) I used dgmres from the SLATEC library. This routine is included in CalculiX.
I also made sure that no data files (input/output) are common to the examples.
The problems (nan) do not occur from the beginning. They occur after maybe 100 up to 200 iterations (at different numbers for each example).
That could indicate that the math is done using different size for floats. Maybe there is an operation somewhere that uses an automatic conversion (casting) to a lesser size (C does this without telling) and precision is lost. I think you said that the problem started when using CUDA. I know little about this, but I suppose that the GPU is one, not 12. I have no idea how it switches from one job to another, but it can not parellize. Maybe it is better to let it finish the current job, then switch to another. Maybe there is a problem here. All threads competing to use the single GPU, maybe it doesn't switch right. I'm guessing, I have never used a GPU. This moment I have a Ryzen processor that claims 12 cores, but I think it is really 6, doubled with whatitsname, hyper-threading? If your test is easy to setup and run, I could try. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)
Hi, the reason why I installed leap15.1 in the first place was cuda. This test, however, is done without cuda. The installation of cuda was done because of the PaStiX linear equation solver, which I only use for structural mechanical problems, not CFD. The test suite at stake, however, is a CFD test suite. For CFD I use the dgmres routine(s) from SLATEC. So I perform this CFD test with only one cpu per example problem and without gpu. I have 12 cpu's and start all 18 examples at the same time. The "computer" is assumed to divide these jobs among its cpu's. If you want to do testing, I can: 1. Send you - an executable for Linux - the test suite (it contains a small script to run the examples automatically) or: 2. Send you - the source code of CalculiX (with Makefile) - the source code of ARPACK (a prerequisite for CalculiX to run; easy to compile) - the test suite (cf. above). Best Greetings and stay healthy, Guido On Sunday, March 22, 2020 8:49:06 PM CET Carlos E. R. wrote:
On 22/03/2020 18.42, Dr. Guido Dhondt wrote:
I am not sure what you mean by runtime requirements.
The only external packages required are ARPACK and SPOOLES. In fact, these packages are only used for structural mechanics calculations. For CFD (the test examples I talk about are CFD examples) I used dgmres from the SLATEC library. This routine is included in CalculiX.
I also made sure that no data files (input/output) are common to the examples.
The problems (nan) do not occur from the beginning. They occur after maybe 100 up to 200 iterations (at different numbers for each example).
That could indicate that the math is done using different size for floats. Maybe there is an operation somewhere that uses an automatic conversion (casting) to a lesser size (C does this without telling) and precision is lost.
I think you said that the problem started when using CUDA. I know little about this, but I suppose that the GPU is one, not 12. I have no idea how it switches from one job to another, but it can not parellize. Maybe it is better to let it finish the current job, then switch to another. Maybe there is a problem here. All threads competing to use the single GPU, maybe it doesn't switch right. I'm guessing, I have never used a GPU.
This moment I have a Ryzen processor that claims 12 cores, but I think it is really 6, doubled with whatitsname, hyper-threading? If your test is easy to setup and run, I could try.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
W dniu 22.03.2020 o 18:10, Dr. Guido Dhondt pisze:
My parallelization only works well, if the examples are really big, e.g. 5 million equations. The test examples are smaller, maybe only 100000 equations. Therefore parallelization brings on 12 cpus max 30 % speedup due to the overhead. Running the examples sequentially one next to the other brings much more speedup, I think a factor of 5 - 10 from what I remember from my runs on leap 15.0.
Then feel free to use parallel.
By the way, running these examples sequentially on another system using Suse Professional (for companies) also works without problems. Only my present system with leap 15.1 produced the problem so far.
For me that sounds like your program relies on undefined behavior, that has manifested itself, when using newer software. I'd start with checking if adding "-fsanitize=undefined" compiler flag gives anything interesting. Check out also other sanitizers and valgrind.
W dniu 22.03.2020 o 14:57, Dr. Guido Dhondt pisze:
Dear Linux-Users,
in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0.
One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script:
for i in *.inp; do
/home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 &
done
This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours).
Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine.
Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age).
I have no idea why this happens. Any idea?
Best Greetings,
Guido
I have absolutely no experience with CalculiX, but I'll try: 1) What do you mean by "they diverge"? 2) Have you tried "parallel" command from gnu_parallel package? It'll allow you to run no more parallel processes that available cores.
With "diverge" I mean that the solution of the equation system at some point leads to nan. This happens to all 18 jobs sooner or later. I will try the parallel option right away. Thanks, Guido On Sunday, March 22, 2020 3:14:43 PM CET Adam Mizerski wrote:
W dniu 22.03.2020 o 14:57, Dr. Guido Dhondt pisze:
Dear Linux-Users,
in 2019 I bought a new computer (12 cores) and installed Opensuse Leap 15.0.
One of the tasks I want to perform is to run all test examples (*.inp) in a given directory with a specific program (CalculiX), using the following script:
for i in *.inp; do
/home/guido/CalculiX/CalculiX ${i%.inp} >> ${i%.inp}.lst 2>&1 &
done
This worked fine. All examples (18 in total) are sent to all cores and executed. Due to the 12 cores the execution time is much shorter (maybe 30 minutes) than running everything not concurrently (hours).
Beginning this year I decided to wipe everything from my computer and install Opensuse Leap 15.1 (the reason was that I wanted to install the cuda library, which is only available for the latest Opensuse version). If I execute now exactly the same script, the jobs start as usual but after some time they diverge (I guess because of some interference during the equation solving). If I use a script that, after starting a job, waits till it finishes before starting the next (basically without & at the end of the script line above), everything works fine.
Since the version of CalculiX now is not the same as the one I ran on Leap 15.0 (CalculiX evolved), I ran my exact current version on my old computer (4 cores, Opensuse 10.3): everything runs and the results are fine (just takes a lot of time due to the low number of cores and architecture age).
I have no idea why this happens. Any idea?
Best Greetings,
Guido
I have absolutely no experience with CalculiX, but I'll try:
1) What do you mean by "they diverge"? 2) Have you tried "parallel" command from gnu_parallel package? It'll allow you to run no more parallel processes that available cores.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (4)
-
Adam Mizerski
-
Carlos E. R.
-
Dr. Guido Dhondt
-
Vojtěch Zeisek