commit slurm for openSUSE:Factory
Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package slurm for openSUSE:Factory checked in at 2024-03-26 19:27:40 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/slurm (Old) and /work/SRC/openSUSE:Factory/.slurm.new.1905 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Package is "slurm" Tue Mar 26 19:27:40 2024 rev:105 rq:1161658 version:23.11.5 Changes: -------- --- /work/SRC/openSUSE:Factory/slurm/slurm.changes 2024-02-27 22:47:59.853765416 +0100 +++ /work/SRC/openSUSE:Factory/.slurm.new.1905/slurm.changes 2024-03-26 19:32:11.555807228 +0100 @@ -1,0 +2,180 @@ +Mon Mar 25 15:16:44 UTC 2024 - Christian Goll <cgoll@suse.com> + +- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch + as incoperated upstream +* Changes in Slurm 23.02.5 + * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu + or pn_min_cpus are being automatically adjusted. + * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if + a node features plugin is configured. + * Fix and prevent reoccurring reservations from overlapping. + * job_container/tmpfs - Avoid attempts to share BasePath between nodes. + * Change the log message warning for rate limited users from verbose to info. + * With CR_Cpu_Memory, fix node selection for jobs that request gres and + *-mem-per-cpu. + * Fix a regression from 22.05.7 in which some jobs were allocated too few + nodes, thus overcommitting cpus to some tasks. + * Fix a job being stuck in the completing state if the job ends while the + primary controller is down or unresponsive and the backup controller has + not yet taken over. + * Fix slurmctld segfault when a node registers with a configured CpuSpecList + while slurmctld configuration has the node without CpuSpecList. + * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not + registering by ResumeTimeout. + * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting + skipped. + * slurmstepd - Cleanup per task generated environment for containers in + spooldir. + * Fix scontrol segfault when 'completing' command requested repeatedly in + interactive mode. + * Properly handle a race condition between bind() and listen() calls in the + network stack when running with SrunPortRange set. + * Federation - Fix revoked jobs being returned regardless of the -a/--all + option for privileged users. + * Federation - Fix canceling pending federated jobs from non-origin clusters + which could leave federated jobs orphaned from the origin cluster. + * Fix sinfo segfault when printing multiple clusters with --noheader option. + * Federation - fix clusters not syncing if clusters are added to a federation + before they have registered with the dbd. + * Change pmi2 plugin to honor the SrunPortRange option. This matches the new + behavior of the pmix plugin in 23.02.0. Note that neither of these plugins + makes use of the "MpiParams=ports=" option, and previously were only limited + by the systems ephemeral port range. + * node_features/helpers - Fix node selection for jobs requesting changeable + features with the '|' operator, which could prevent jobs from running on + some valid nodes. + * node_features/helpers - Fix inconsistent handling of '&' and '|', where an + AND'd feature was sometimes AND'd to all sets of features instead of just + the current set. E.g. "foo|bar&baz" was interpreted as {foo,baz} or + {bar,baz} instead of how it is documented: "{foo} or {bar,baz}". + * Fix job accounting so that when a job is requeued its allocated node count + is cleared. After the requeue, sacct will correctly show that the job has + 0 AllocNodes while it is pending or if it is canceled before restarting. + * sacct - AllocCPUS now correctly shows 0 if a job has not yet received an + allocation or if the job was canceled before getting one. + * Fix intel oneapi autodetect: detect the /dev/dri/renderD[0-9]+ gpus, and do + not detect /dev/dri/card[0*9]+. + * Format batch, extern, interactive, and pending step ids into strings that + are human readable. + * Fix node selection for jobs that request --gpus and a number of tasks fewer + than gpus, which resulted in incorrectly rejecting these jobs. + * Remove MYSQL_OPT_RECONNECT completely. + * Fix cloud nodes in POWERING_UP state disappearing (getting set to FUTURE) + when an `scontrol reconfigure` happens. + * openapi/dbv0.0.39 - Avoid assert / segfault on missing coordinators list. + * slurmrestd - Correct memory leak while parsing OpenAPI specification + templates with server overrides. + * slurmrestd - Reduce memory usage when printing out job CPU frequency. + * Fix overwriting user node reason with system message. + * Remove --uid / --gid options from salloc and srun commands. + * Prevent deadlock when rpc_queue is enabled. + * slurmrestd - Correct OpenAPI specification generation bug where fields with + overlapping parent paths would not get generated. + * Fix memory leak as a result of a partition info query. + * Fix memory leak as a result of a job info query. + * slurmrestd - For 'GET /slurm/v0.0.39/node[s]', change format of node's + energy field "current_watts" to a dictionary to account for unset value + instead of dumping 4294967294. + * slurmrestd - For 'GET /slurm/v0.0.39/qos', change format of QOS's + field "priority" to a dictionary to account for unset value instead of + dumping 4294967294. + * slurmrestd - For 'GET /slurm/v0.0.39/job[s]', the 'return code' code field + in v0.0.39_job_exit_code will be set to *127 instead of being left unset + where job does not have a relevant return code. + * data_parser/v0.0.39 - Add required/memory_per_cpu and + required/memory_per_node to `sacct *-json` and `sacct --yaml` and + 'GET /slurmdb/v0.0.39/jobs' from slurmrestd. + * For step allocations, fix --gres=none sometimes not ignoring gres from the + job. + * Fix --exclusive jobs incorrectly gang-scheduling where they shouldn't. + * Fix allocations with CR_SOCKET, gres not assigned to a specific socket, and + block core distribion potentially allocating more sockets than required. + * gpu/oneapi - Store cores correctly so CPU affinity is tracked. + * Revert a change in 23.02.3 where Slurm would kill a script's process group + as soon as the script ended instead of waiting as long as any process in + that process group held the stdout/stderr file descriptors open. That change + broke some scripts that relied on the previous behavior. Setting time limits + for scripts (such as PrologEpilogTimeout) is strongly encouraged to avoid + Slurm waiting indefinitely for scripts to finish. + * Allow slurmdbd -R to work if the root assoc id is not 1. + * Fix slurmdbd -R not returning an error under certain conditions. + * slurmdbd - Avoid potential NULL pointer dereference in the mysql plugin. + * Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's + environment when *-ntasks-per-node was requested. + * Limit periodic node registrations to 50 instead of the full TreeWidth. + Since unresolvable cloud/dynamic nodes must disable fanout by setting + TreeWidth to a large number, this would cause all nodes to register at + once. + * Fix regression in 23.02.3 which broken x11 forwarding for hosts when + MUNGE sends a localhost address in the encode host field. This is caused + when the node hostname is mapped to 127.0.0.1 (or similar) in /etc/hosts. + * openapi/[db]v0.0.39 - fix memory leak on parsing error. + * data_parser/v0.0.39 - fix updating qos for associations. + * openapi/dbv0.0.39 - fix updating values for associations with null users. + * Fix minor memory leak with --tres-per-task and licenses. + * Fix cyclic socket cpu distribution for tasks in a step where + --cpus-per-task < usable threads per core. +- Changes in Slurm 23.02.4 + * Fix sbatch return code when **wait is requested on a job array. + * switch/hpe_slingshot * avoid segfault when running with old libcxi. + * Avoid slurmctld segfault when specifying AccountingStorageExternalHost. + * Fix collected GPUUtilization values for acct_gather_profile plugins. + * Fix slurmrestd handling of job hold/release operations. + * Make spank S_JOB_ARGV item value hold the requested command argv instead of + the srun **bcast value when **bcast requested (only in local context). + * Fix step running indefinitely when slurmctld takes more than MessageTimeout + to respond. Now, slurmctld will cancel the step when detected, preventing + following steps from getting stuck waiting for resources to be released. + * Fix regression to make job_desc.min_cpus accurate again in job_submit when + requesting a job with **ntasks*per*node. + * scontrol * Permit changes to StdErr and StdIn for pending jobs. + * scontrol * Reset std{err,in,out} when set to empty string. + * slurmrestd * mark environment as a required field for job submission + descriptions. + * slurmrestd * avoid dumping null in OpenAPI schema required fields. + * data_parser/v0.0.39 * avoid rejecting valid memory_per_node formatted as + dictionary provided with a job description. + * data_parser/v0.0.39 * avoid rejecting valid memory_per_cpu formatted as + dictionary provided with a job description. + * slurmrestd * Return HTTP error code 404 when job query fails. + * slurmrestd * Add return schema to error response to job and license query. + * Fix handling of ArrayTaskThrottle in backfill. + * Fix regression in 23.02.2 when checking gres state on slurmctld startup or + reconfigure. Gres changes in the configuration were not updated on slurmctld + startup. On startup or reconfigure, these messages were present in the log: + "error: Attempt to change gres/gpu Count". + * Fix potential double count of gres when dealing with limits. + * switch/hpe_slingshot * support alternate traffic class names with "TC_" + prefix. + * scrontab * Fix cutting off the final character of quoted variables. + * Fix slurmstepd segfault when ContainerPath is not set in oci.conf + * Change the log message warning for rate limited users from debug to verbose. + * Fixed an issue where jobs requesting licenses were incorrectly rejected. + * smail * Fix issues where e*mails at job completion were not being sent. + * scontrol/slurmctld * fix comma parsing when updating a reservation's nodes. + * cgroup/v2 * Avoid capturing log output for ebpf when constraining devices, + as this can lead to inadvertent failure if the log buffer is too small. + * Fix **gpu*bind=single binding tasks to wrong gpus, leading to some gpus + having more tasks than they should and other gpus being unused. + * Fix main scheduler loop not starting after failover to backup controller. + * Added error message when attempting to use sattach on batch or extern steps. + * Fix regression in 23.02 that causes slurmstepd to crash when srun requests + more than TreeWidth nodes in a step and uses the pmi2 or pmix plugin. + * Reject job ArrayTaskThrottle update requests from unprivileged users. + * data_parser/v0.0.39 * populate description fields of property objects in + generated OpenAPI specifications where defined. + * slurmstepd * Avoid segfault caused by ContainerPath not being terminated by + '/' in oci.conf. + * data_parser/v0.0.39 * Change v0.0.39_job_info response to tag exit_code + field as being complex instead of only an unsigned integer. + * job_container/tmpfs * Fix %h and %n substitution in BasePath where %h was + substituted as the NodeName instead of the hostname, and %n was substituted + as an empty string. + * Fix regression where **cpu*bind=verbose would override TaskPluginParam. + * scancel * Fix **clusters/*M for federations. Only filtered jobs (e.g. *A, + *u, *p, etc.) from the specified clusters will be canceled, rather than all + jobs in the federation. Specific jobids will still be routed to the origin + cluster for cancellation. + + +------------------------------------------------------------------- @@ -1303 +1483 @@ - work correctly. + work correctly (boo#1204697). Old: ---- Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch slurm-23.11.3.tar.bz2 New: ---- slurm-23.11.5.tar.bz2 BETA DEBUG BEGIN: Old: - removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch as incoperated upstream BETA DEBUG END: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ slurm.spec ++++++ --- /var/tmp/diff_new_pack.koUFkj/_old 2024-03-26 19:32:13.479877544 +0100 +++ /var/tmp/diff_new_pack.koUFkj/_new 2024-03-26 19:32:13.479877544 +0100 @@ -19,7 +19,7 @@ # Check file META in sources: update so_version to (API_CURRENT - API_AGE) %define so_version 40 # Make sure to update `upgrades` as well! -%define ver 23.11.3 +%define ver 23.11.5 %define _ver _23_11 %define dl_ver %{ver} # so-version is 0 and seems to be stable @@ -171,7 +171,7 @@ Patch0: Remove-rpath-from-build.patch Patch2: pam_slurm-Initialize-arrays-and-pass-sizes.patch Patch10: Fix-test-21.41.patch -Patch14: Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch +#Patch14: Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch Patch15: Fix-test7.2-to-find-libpmix-under-lib64-as-well.patch %{upgrade_dep %pname} @@ -1112,7 +1112,8 @@ %{_mandir}/man1/sjobexitmod.1.* %{_mandir}/man1/sjstat.1.* %{_mandir}/man8/slurmctld.* -%{_mandir}/man8/spank* +%{_mandir}/man8/spank.* +%{_mandir}/man8/sackd.* %files openlava %{_bindir}/bjobs ++++++ slurm-23.11.3.tar.bz2 -> slurm-23.11.5.tar.bz2 ++++++ /work/SRC/openSUSE:Factory/slurm/slurm-23.11.3.tar.bz2 /work/SRC/openSUSE:Factory/.slurm.new.1905/slurm-23.11.5.tar.bz2 differ: char 11, line 1
participants (1)
-
Source-Sync