commit slurm for openSUSE:Factory

26 Mar 2024

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package slurm for openSUSE:Factory checked in at 2024-03-26 19:27:40
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/slurm (Old)
 and      /work/SRC/openSUSE:Factory/.slurm.new.1905 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Package is "slurm"

Tue Mar 26 19:27:40 2024 rev:105 rq:1161658 version:23.11.5

Changes:
--------

--- /work/SRC/openSUSE:Factory/slurm/slurm.changes	2024-02-27 22:47:59.853765416 +0100
+++ /work/SRC/openSUSE:Factory/.slurm.new.1905/slurm.changes	2024-03-26 19:32:11.555807228 +0100
@@ -1,0 +2,180 @@
+Mon Mar 25 15:16:44 UTC 2024 - Christian Goll <cgoll@suse.com>
+
+- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
+  as incoperated upstream
+* Changes in Slurm 23.02.5
+ * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu
+   or pn_min_cpus are being automatically adjusted.
+ * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
+   a node features plugin is configured.
+ * Fix and prevent reoccurring reservations from overlapping.
+ * job_container/tmpfs - Avoid attempts to share BasePath between nodes.
+ * Change the log message warning for rate limited users from verbose to info.
+ * With CR_Cpu_Memory, fix node selection for jobs that request gres and
+   *-mem-per-cpu.
+ * Fix a regression from 22.05.7 in which some jobs were allocated too few
+   nodes, thus overcommitting cpus to some tasks.
+ * Fix a job being stuck in the completing state if the job ends while the
+   primary controller is down or unresponsive and the backup controller has
+   not yet taken over.
+ * Fix slurmctld segfault when a node registers with a configured CpuSpecList
+   while slurmctld configuration has the node without CpuSpecList.
+ * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not
+   registering by ResumeTimeout.
+ * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting
+   skipped.
+ * slurmstepd - Cleanup per task generated environment for containers in
+   spooldir.
+ * Fix scontrol segfault when 'completing' command requested repeatedly in
+   interactive mode.
+ * Properly handle a race condition between bind() and listen() calls in the
+   network stack when running with SrunPortRange set.
+ * Federation - Fix revoked jobs being returned regardless of the -a/--all
+   option for privileged users.
+ * Federation - Fix canceling pending federated jobs from non-origin clusters
+   which could leave federated jobs orphaned from the origin cluster.
+ * Fix sinfo segfault when printing multiple clusters with --noheader option.
+ * Federation - fix clusters not syncing if clusters are added to a federation
+   before they have registered with the dbd.
+ * Change pmi2 plugin to honor the SrunPortRange option. This matches the new
+   behavior of the pmix plugin in 23.02.0. Note that neither of these plugins
+   makes use of the "MpiParams=ports=" option, and previously were only limited
+   by the systems ephemeral port range.
+ * node_features/helpers - Fix node selection for jobs requesting changeable
+   features with the '|' operator, which could prevent jobs from running on
+   some valid nodes.
+ * node_features/helpers - Fix inconsistent handling of '&' and '|', where an
+   AND'd feature was sometimes AND'd to all sets of features instead of just
+   the current set. E.g. "foo|bar&baz" was interpreted as {foo,baz} or
+   {bar,baz} instead of how it is documented: "{foo} or {bar,baz}".
+ * Fix job accounting so that when a job is requeued its allocated node count
+   is cleared. After the requeue, sacct will correctly show that the job has
+   0 AllocNodes while it is pending or if it is canceled before restarting.
+ * sacct - AllocCPUS now correctly shows 0 if a job has not yet received an
+   allocation or if the job was canceled before getting one.
+ * Fix intel oneapi autodetect: detect the /dev/dri/renderD[0-9]+ gpus, and do
+   not detect /dev/dri/card[0*9]+.
+ * Format batch, extern, interactive, and pending step ids into strings that
+   are human readable.
+ * Fix node selection for jobs that request --gpus and a number of tasks fewer
+   than gpus, which resulted in incorrectly rejecting these jobs.
+ * Remove MYSQL_OPT_RECONNECT completely.
+ * Fix cloud nodes in POWERING_UP state disappearing (getting set to FUTURE)
+   when an `scontrol reconfigure` happens.
+ * openapi/dbv0.0.39 - Avoid assert / segfault on missing coordinators list.
+ * slurmrestd - Correct memory leak while parsing OpenAPI specification
+   templates with server overrides.
+ * slurmrestd - Reduce memory usage when printing out job CPU frequency.
+ * Fix overwriting user node reason with system message.
+ * Remove --uid / --gid options from salloc and srun commands.
+ * Prevent deadlock when rpc_queue is enabled.
+ * slurmrestd - Correct OpenAPI specification generation bug where fields with
+   overlapping parent paths would not get generated.
+ * Fix memory leak as a result of a partition info query.
+ * Fix memory leak as a result of a job info query.
+ * slurmrestd - For 'GET /slurm/v0.0.39/node[s]', change format of node's
+   energy field "current_watts" to a dictionary to account for unset value
+   instead of dumping 4294967294.
+ * slurmrestd - For 'GET /slurm/v0.0.39/qos', change format of QOS's
+   field "priority" to a dictionary to account for unset value instead of
+   dumping 4294967294.
+ * slurmrestd - For 'GET /slurm/v0.0.39/job[s]', the 'return code' code field
+   in v0.0.39_job_exit_code will be set to *127 instead of being left unset
+   where job does not have a relevant return code.
+ * data_parser/v0.0.39 - Add required/memory_per_cpu and
+   required/memory_per_node to `sacct *-json` and `sacct --yaml` and
+   'GET /slurmdb/v0.0.39/jobs' from slurmrestd.
+ * For step allocations, fix --gres=none sometimes not ignoring gres from the
+   job.
+ * Fix --exclusive jobs incorrectly gang-scheduling where they shouldn't.
+ * Fix allocations with CR_SOCKET, gres not assigned to a specific socket, and
+   block core distribion potentially allocating more sockets than required.
+ * gpu/oneapi - Store cores correctly so CPU affinity is tracked.
+ * Revert a change in 23.02.3 where Slurm would kill a script's process group
+   as soon as the script ended instead of waiting as long as any process in
+   that process group held the stdout/stderr file descriptors open. That change
+   broke some scripts that relied on the previous behavior. Setting time limits
+   for scripts (such as PrologEpilogTimeout) is strongly encouraged to avoid
+   Slurm waiting indefinitely for scripts to finish.
+ * Allow slurmdbd -R to work if the root assoc id is not 1.
+ * Fix slurmdbd -R not returning an error under certain conditions.
+ * slurmdbd - Avoid potential NULL pointer dereference in the mysql plugin.
+ * Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's
+   environment when *-ntasks-per-node was requested.
+ * Limit periodic node registrations to 50 instead of the full TreeWidth.
+   Since unresolvable cloud/dynamic nodes must disable fanout by setting
+   TreeWidth to a large number, this would cause all nodes to register at
+   once.
+ * Fix regression in 23.02.3 which broken x11 forwarding for hosts when
+   MUNGE sends a localhost address in the encode host field. This is caused
+   when the node hostname is mapped to 127.0.0.1 (or similar) in /etc/hosts.
+ * openapi/[db]v0.0.39 - fix memory leak on parsing error.
+ * data_parser/v0.0.39 - fix updating qos for associations.
+ * openapi/dbv0.0.39 - fix updating values for associations with null users.
+ * Fix minor memory leak with --tres-per-task and licenses.
+ * Fix cyclic socket cpu distribution for tasks in a step where
+   --cpus-per-task < usable threads per core.
+- Changes in Slurm 23.02.4
+  * Fix sbatch return code when **wait is requested on a job array.
+  * switch/hpe_slingshot * avoid segfault when running with old libcxi.
+  * Avoid slurmctld segfault when specifying AccountingStorageExternalHost.
+  * Fix collected GPUUtilization values for acct_gather_profile plugins.
+  * Fix slurmrestd handling of job hold/release operations.
+  * Make spank S_JOB_ARGV item value hold the requested command argv instead of
+    the srun **bcast value when **bcast requested (only in local context).
+  * Fix step running indefinitely when slurmctld takes more than MessageTimeout
+    to respond. Now, slurmctld will cancel the step when detected, preventing
+    following steps from getting stuck waiting for resources to be released.
+  * Fix regression to make job_desc.min_cpus accurate again in job_submit when
+    requesting a job with **ntasks*per*node.
+  * scontrol * Permit changes to StdErr and StdIn for pending jobs.
+  * scontrol * Reset std{err,in,out} when set to empty string.
+  * slurmrestd * mark environment as a required field for job submission
+    descriptions.
+  * slurmrestd * avoid dumping null in OpenAPI schema required fields.
+  * data_parser/v0.0.39 * avoid rejecting valid memory_per_node formatted as
+    dictionary provided with a job description.
+  * data_parser/v0.0.39 * avoid rejecting valid memory_per_cpu formatted as
+    dictionary provided with a job description.
+  * slurmrestd * Return HTTP error code 404 when job query fails.
+  * slurmrestd * Add return schema to error response to job and license query.
+  * Fix handling of ArrayTaskThrottle in backfill.
+  * Fix regression in 23.02.2 when checking gres state on slurmctld startup or
+    reconfigure. Gres changes in the configuration were not updated on slurmctld
+    startup. On startup or reconfigure, these messages were present in the log:
+    "error: Attempt to change gres/gpu Count".
+  * Fix potential double count of gres when dealing with limits.
+  * switch/hpe_slingshot * support alternate traffic class names with "TC_"
+    prefix.
+  * scrontab * Fix cutting off the final character of quoted variables.
+  * Fix slurmstepd segfault when ContainerPath is not set in oci.conf
+  * Change the log message warning for rate limited users from debug to verbose.
+  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
+  * smail * Fix issues where e*mails at job completion were not being sent.
+  * scontrol/slurmctld * fix comma parsing when updating a reservation's nodes.
+  * cgroup/v2 * Avoid capturing log output for ebpf when constraining devices,
+    as this can lead to inadvertent failure if the log buffer is too small.
+  * Fix **gpu*bind=single binding tasks to wrong gpus, leading to some gpus
+    having more tasks than they should and other gpus being unused.
+  * Fix main scheduler loop not starting after failover to backup controller.
+  * Added error message when attempting to use sattach on batch or extern steps.
+  * Fix regression in 23.02 that causes slurmstepd to crash when srun requests
+    more than TreeWidth nodes in a step and uses the pmi2 or pmix plugin.
+  * Reject job ArrayTaskThrottle update requests from unprivileged users.
+  * data_parser/v0.0.39 * populate description fields of property objects in
+    generated OpenAPI specifications where defined.
+  * slurmstepd * Avoid segfault caused by ContainerPath not being terminated by
+    '/' in oci.conf.
+  * data_parser/v0.0.39 * Change v0.0.39_job_info response to tag exit_code
+    field as being complex instead of only an unsigned integer.
+  * job_container/tmpfs * Fix %h and %n substitution in BasePath where %h was
+    substituted as the NodeName instead of the hostname, and %n was substituted
+    as an empty string.
+  * Fix regression where **cpu*bind=verbose would override TaskPluginParam.
+  * scancel * Fix **clusters/*M for federations. Only filtered jobs (e.g. *A,
+    *u, *p, etc.) from the specified clusters will be canceled, rather than all
+    jobs in the federation. Specific jobids will still be routed to the origin
+     cluster for cancellation.
+
+
+-------------------------------------------------------------------
@@ -1303 +1483 @@
-    work correctly.
+    work correctly (boo#1204697).

Old:
----
  Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
  slurm-23.11.3.tar.bz2

New:
----
  slurm-23.11.5.tar.bz2

BETA DEBUG BEGIN:
  Old:
- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
  as incoperated upstream
BETA DEBUG END:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ slurm.spec ++++++
--- /var/tmp/diff_new_pack.koUFkj/_old	2024-03-26 19:32:13.479877544 +0100
+++ /var/tmp/diff_new_pack.koUFkj/_new	2024-03-26 19:32:13.479877544 +0100
@@ -19,7 +19,7 @@
 # Check file META in sources: update so_version to (API_CURRENT - API_AGE)
 %define so_version 40
 # Make sure to update `upgrades` as well!
-%define ver 23.11.3
+%define ver 23.11.5
 %define _ver _23_11
 %define dl_ver %{ver}
 # so-version is 0 and seems to be stable
@@ -171,7 +171,7 @@
 Patch0:         Remove-rpath-from-build.patch
 Patch2:         pam_slurm-Initialize-arrays-and-pass-sizes.patch
 Patch10:        Fix-test-21.41.patch
-Patch14:        Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
+#Patch14:        Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
 Patch15:        Fix-test7.2-to-find-libpmix-under-lib64-as-well.patch
 
 %{upgrade_dep %pname}
@@ -1112,7 +1112,8 @@
 %{_mandir}/man1/sjobexitmod.1.*
 %{_mandir}/man1/sjstat.1.*
 %{_mandir}/man8/slurmctld.*
-%{_mandir}/man8/spank*
+%{_mandir}/man8/spank.*
+%{_mandir}/man8/sackd.*
 
 %files openlava
 %{_bindir}/bjobs

++++++ slurm-23.11.3.tar.bz2 -> slurm-23.11.5.tar.bz2 ++++++
/work/SRC/openSUSE:Factory/slurm/slurm-23.11.3.tar.bz2 /work/SRC/openSUSE:Factory/.slurm.new.1905/slurm-23.11.5.tar.bz2 differ: char 11, line 1

    

Source-Sync

tags

participants (1)