[Bug 1214469] New: Networking Issue After Transactional Update
https://bugzilla.suse.com/show_bug.cgi?id=1214469 Bug ID: 1214469 Summary: Networking Issue After Transactional Update Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: Other OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: MicroOS Assignee: kubic-bugs@opensuse.org Reporter: samcon@protonmail.com QA Contact: qa-bugs@suse.de Target Milestone: --- Found By: --- Blocker: --- Created attachment 868938 --> https://bugzilla.suse.com/attachment.cgi?id=868938&action=edit pre-update Summary: After performing a transactional update on an openSUSE Tumbleweed system, the networking bridges and virtual interfaces were not functioning. As a result, the affected device, which serves as a VM-host for an opnsense firewall, lost its network connectivity. Rolling back to an older snapshot and disabling the transactional-update.timer service was required to restore networking functionality. The issue had significant impact as all devices on the network lost internet connectivity. Manual intervention in the form of a keyboard and monitor connection was needed to perform the rollback. Description: On [Date], I performed a transactional update on my openSUSE Tumbleweed system. The system was being used as a VM-host for an opnsense firewall, which plays a critical role in managing network traffic for my network. The transactional update was carried out using the standard update procedure. After the update was applied and the system rebooted, it was immediately apparent that there were issues with the networking configuration. None of the networking bridges or virtual interfaces were functioning as expected. This resulted in a complete loss of network connectivity for all the virtual machines running on the host, as well as the host itself. Due to the severity of the issue and the impact it had on the network, I decided to manually intervene by connecting a keyboard and monitor to the affected system. After accessing the system locally, I attempted to diagnose the problem. It became evident that the issue was related to the recent transactional update, as rolling back to an older snapshot of the system resulted in the restoration of networking functionality. This process involved using the system's snapshot manager to revert to a state prior to the update. Additionally, to prevent this issue from occurring again in the future, I disabled the transactional-update.timer service. While this action helped avoid further disruption, it's important to note that transactional updates are a critical part of openSUSE Tumbleweed's update process and should ideally work without causing networking issues. Impact: The impact of this issue was significant. Due to the loss of networking bridges and virtual interfaces, all the devices on the network, including the virtual machines hosted on the affected system, lost internet connectivity. This disruption lasted until I could manually intervene, rollback to an older snapshot, and disable the transactional update timer. The incident led to downtime, network interruptions, and required manual intervention, which is not ideal for a system that's intended to provide stable and uninterrupted network services. Steps to Reproduce: - Install openSUSE Tumbleweed on a system configured as a VM-host. - Set up networking bridges and virtual interfaces. - Perform a transactional update using the standard update procedure. - Reboot the system after the update. - Observe that networking bridges and virtual interfaces are not functioning as expected, resulting in a loss of network connectivity for all devices. Expected Results: After a transactional update, the system's networking configuration, including bridges and virtual interfaces, should remain intact, and the network connectivity for all devices should not be affected. Additional Information: - Intel Pentium N6005, 4x Intel i226-V 2.5Gb LAN - Output of relevant commands`ip addr show` see attached -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1214469 https://bugzilla.suse.com/show_bug.cgi?id=1214469#c1 --- Comment #1 from Samuel Conway <samcon@protonmail.com> --- Created attachment 868939 --> https://bugzilla.suse.com/attachment.cgi?id=868939&action=edit post-update -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1214469 https://bugzilla.suse.com/show_bug.cgi?id=1214469#c2 --- Comment #2 from Samuel Conway <samcon@protonmail.com> --- Not sure this helps, but when I entered a transactional-update shell session, i get some Operation not supported messages: ❯ transactional-update shell Checking for newer version. New version found - updating... Loading repository data... Reading installed packages... Retrieving: transactional-update-4.3.0-1.2.x86_64 (openSUSE-Tumbleweed-Oss) (1/1), 71.0 KiB Retrieving: transactional-update-4.3.0-1.2.x86_64.rpm .....................................................[done] (1/1) /tmp/transactional-update.sFwAhDUSa5/repo-oss/x86_64/transactional-update-4.3.0-1.2.x86_64.rpm ......[done] Loading repository data... Reading installed packages... Retrieving: libtukit4-4.3.0-1.2.x86_64 (openSUSE-Tumbleweed-Oss) (1/2), 163.7 KiB Retrieving: libtukit4-4.3.0-1.2.x86_64.rpm ................................................................[done] (1/2) /tmp/transactional-update.sFwAhDUSa5/repo-oss/x86_64/libtukit4-4.3.0-1.2.x86_64.rpm .................[done] Retrieving: tukit-4.3.0-1.2.x86_64 (openSUSE-Tumbleweed-Oss) (2/2), 67.6 KiB Retrieving: tukit-4.3.0-1.2.x86_64.rpm ....................................................................[done] (2/2) /tmp/transactional-update.sFwAhDUSa5/repo-oss/x86_64/tukit-4.3.0-1.2.x86_64.rpm .....................[done] transactional-update 4.3.0 started Options: shell Separate /var detected. 2023-08-22 14:56:57 tukit 4.3.0 started 2023-08-22 14:56:57 Options: -c138 open 2023-08-22 14:57:00 Using snapshot 138 as base for new snapshot 141. 2023-08-22 14:57:00 Syncing /etc of previous snapshot 137 as base into new snapshot "/.snapshots/141/snapshot" 2023-08-22 14:57:00 SELinux is enabled. Relabeled /var/lib/machines from system_u:object_r:unlabeled_t:s0 to system_u:object_r:systemd_machined_var_lib_t:s0 setxattr failed: /var/lib/machines: Operation not supported ID: 141 2023-08-22 14:57:17 Transaction completed. Opening chroot in snapshot 141, continue with 'exit' 2023-08-22 14:57:17 tukit 4.3.0 started 2023-08-22 14:57:17 Options: call 141 bash Relabeled /var/lib/machines from system_u:object_r:unlabeled_t:s0 to system_u:object_r:systemd_machined_var_lib_t:s0 setxattr failed: /var/lib/machines: Operation not supported 2023-08-22 14:57:19 Executing `bash`: root in / ❯ exit 2023-08-22 14:57:56 Application returned with exit status 0. 2023-08-22 14:57:56 Transaction completed. 2023-08-22 14:57:56 tukit 4.3.0 started 2023-08-22 14:57:56 Options: close 141 Relabeled /var/lib/machines from system_u:object_r:unlabeled_t:s0 to system_u:object_r:systemd_machined_var_lib_t:s0 setxattr failed: /var/lib/machines: Operation not supported 2023-08-22 14:58:01 New default snapshot is #141 (/.snapshots/141/snapshot). 2023-08-22 14:58:01 Transaction completed. Please reboot your machine to activate the changes and avoid data loss. New default snapshot is #141 (/.snapshots/141/snapshot). transactional-update finished -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1214469 https://bugzilla.suse.com/show_bug.cgi?id=1214469#c4 --- Comment #4 from Samuel Conway <samcon@protonmail.com> --- Hi Thorston, thanks for response, I guess I panicked a bit... There are two failed services in current snapshot. ❯ systemctl list-units --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online ● snapper-cleanup.service loaded failed failed Daily Cleanup of Snapper Snapshots LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 2 loaded units listed. I managed to start NetworkManager-wait-online.service, but snapper-cleanup.service is throwing error. ❯ systemctl status snapper-cleanup.service × snapper-cleanup.service - Daily Cleanup of Snapper Snapshots Loaded: loaded (/etc/systemd/system/snapper-cleanup.service; static) Active: failed (Result: exit-code) since Tue 2023-08-22 14:43:32 CEST; 30min ago Duration: 70ms TriggeredBy: ● snapper-cleanup.timer Docs: man:snapper(8) man:snapper-configs(5) Main PID: 27309 (code=exited, status=1/FAILURE) CPU: 10ms Aug 22 14:43:32 srv01 systemd[1]: Started Daily Cleanup of Snapper Snapshots. Aug 22 14:43:32 srv01 systemd-helper[27309]: running cleanup for 'root'. Aug 22 14:43:32 srv01 systemd-helper[27309]: running number cleanup for 'root'. Aug 22 14:43:32 srv01 systemd-helper[27309]: Deleting snapshot failed. Aug 22 14:43:32 srv01 systemd-helper[27309]: number cleanup for 'root' failed. Aug 22 14:43:32 srv01 systemd-helper[27309]: running timeline cleanup for 'root'. Aug 22 14:43:32 srv01 systemd-helper[27309]: running empty-pre-post cleanup for 'root'. Aug 22 14:43:32 srv01 systemd[1]: snapper-cleanup.service: Main process exited, code=exited, status=1/FAILURE Aug 22 14:43:32 srv01 systemd[1]: snapper-cleanup.service: Failed with result 'exit-code'. I suspect this is the cause, since it is not removing older snapshots causing my /root partition to only have 1Gb free space. Currently I have 62-142 snapshots, I was able to remove 63-90 with: snapper delete 63-90 For some reason, deleting 62 throws an error: Deleting snapshot failed. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1214469 https://bugzilla.suse.com/show_bug.cgi?id=1214469#c6 --- Comment #6 from Samuel Conway <samcon@protonmail.com> --- (In reply to Santiago Zarate from comment #5)
(In reply to Samuel Conway from comment #0)
Created attachment 868938 [details] pre-update
Summary: After performing a transactional update on an openSUSE Tumbleweed system, the networking bridges and virtual interfaces were not functioning. As a result, the affected device, which serves as a VM-host for an opnsense firewall, lost its network connectivity. Rolling back to an older snapshot and disabling the transactional-update.timer service was required to restore networking functionality. The issue had significant impact as all devices on the network lost internet connectivity. Manual intervention in the form of a keyboard and monitor connection was needed to perform the rollback.
Description: On [Date], I performed a transactional update on my openSUSE Tumbleweed system. The system was being used as a VM-host for an opnsense firewall, which plays a critical role in managing network traffic for my network. The transactional update was carried out using the standard update procedure.
After the update was applied and the system rebooted, it was immediately apparent that there were issues with the networking configuration. None of the networking bridges or virtual interfaces were functioning as expected. This resulted in a complete loss of network connectivity for all the virtual machines running on the host, as well as the host itself.
Due to the severity of the issue and the impact it had on the network, I decided to manually intervene by connecting a keyboard and monitor to the affected system. After accessing the system locally, I attempted to diagnose the problem. It became evident that the issue was related to the recent transactional update, as rolling back to an older snapshot of the system resulted in the restoration of networking functionality. This process involved using the system's snapshot manager to revert to a state prior to the update.
Additionally, to prevent this issue from occurring again in the future, I disabled the transactional-update.timer service. While this action helped avoid further disruption, it's important to note that transactional updates are a critical part of openSUSE Tumbleweed's update process and should ideally work without causing networking issues.
Impact: The impact of this issue was significant. Due to the loss of networking bridges and virtual interfaces, all the devices on the network, including the virtual machines hosted on the affected system, lost internet connectivity. This disruption lasted until I could manually intervene, rollback to an older snapshot, and disable the transactional update timer. The incident led to downtime, network interruptions, and required manual intervention, which is not ideal for a system that's intended to provide stable and uninterrupted network services.
Steps to Reproduce:
- Install openSUSE Tumbleweed on a system configured as a VM-host. - Set up networking bridges and virtual interfaces. - Perform a transactional update using the standard update procedure. - Reboot the system after the update. - Observe that networking bridges and virtual interfaces are not functioning as expected, resulting in a loss of network connectivity for all devices.
Expected Results: After a transactional update, the system's networking configuration, including bridges and virtual interfaces, should remain intact, and the network connectivity for all devices should not be affected.
Additional Information: - Intel Pentium N6005, 4x Intel i226-V 2.5Gb LAN - Output of relevant commands`ip addr show` see attached
Can you describe the network configuration? I have a personal host that's also a VM host, and is also a VPN (WG) gateway and haven't had issues, despite it having two VMs having bridged networks
flowchart TD A[Internet] --> system[host network adapter] system --> bridge{libvirtd-bridged network} bridge --> VM1 bridge --> VM2 bridge --> VM3
(See routed network on hetzner: https://docs.hetzner.com/robot/dedicated-server/ip/additional-ip-adresses/)
My setup was done via Cockpit-GUI. I guess I have the same: A[Internet] --> system[host network adapter] system --> bridge bridge --> VM1 All three Network interfaces are direct attachments, while one is not used. Perhaps there are commands I can use to help describe my setup better? -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1214469 https://bugzilla.suse.com/show_bug.cgi?id=1214469#c7 --- Comment #7 from Samuel Conway <samcon@protonmail.com> --- (In reply to Samuel Conway from comment #6)
(In reply to Santiago Zarate from comment #5)
(In reply to Samuel Conway from comment #0)
Created attachment 868938 [details] pre-update
Summary: After performing a transactional update on an openSUSE Tumbleweed system, the networking bridges and virtual interfaces were not functioning. As a result, the affected device, which serves as a VM-host for an opnsense firewall, lost its network connectivity. Rolling back to an older snapshot and disabling the transactional-update.timer service was required to restore networking functionality. The issue had significant impact as all devices on the network lost internet connectivity. Manual intervention in the form of a keyboard and monitor connection was needed to perform the rollback.
Description: On [Date], I performed a transactional update on my openSUSE Tumbleweed system. The system was being used as a VM-host for an opnsense firewall, which plays a critical role in managing network traffic for my network. The transactional update was carried out using the standard update procedure.
After the update was applied and the system rebooted, it was immediately apparent that there were issues with the networking configuration. None of the networking bridges or virtual interfaces were functioning as expected. This resulted in a complete loss of network connectivity for all the virtual machines running on the host, as well as the host itself.
Due to the severity of the issue and the impact it had on the network, I decided to manually intervene by connecting a keyboard and monitor to the affected system. After accessing the system locally, I attempted to diagnose the problem. It became evident that the issue was related to the recent transactional update, as rolling back to an older snapshot of the system resulted in the restoration of networking functionality. This process involved using the system's snapshot manager to revert to a state prior to the update.
Additionally, to prevent this issue from occurring again in the future, I disabled the transactional-update.timer service. While this action helped avoid further disruption, it's important to note that transactional updates are a critical part of openSUSE Tumbleweed's update process and should ideally work without causing networking issues.
Impact: The impact of this issue was significant. Due to the loss of networking bridges and virtual interfaces, all the devices on the network, including the virtual machines hosted on the affected system, lost internet connectivity. This disruption lasted until I could manually intervene, rollback to an older snapshot, and disable the transactional update timer. The incident led to downtime, network interruptions, and required manual intervention, which is not ideal for a system that's intended to provide stable and uninterrupted network services.
Steps to Reproduce:
- Install openSUSE Tumbleweed on a system configured as a VM-host. - Set up networking bridges and virtual interfaces. - Perform a transactional update using the standard update procedure. - Reboot the system after the update. - Observe that networking bridges and virtual interfaces are not functioning as expected, resulting in a loss of network connectivity for all devices.
Expected Results: After a transactional update, the system's networking configuration, including bridges and virtual interfaces, should remain intact, and the network connectivity for all devices should not be affected.
Additional Information: - Intel Pentium N6005, 4x Intel i226-V 2.5Gb LAN - Output of relevant commands`ip addr show` see attached
Can you describe the network configuration? I have a personal host that's also a VM host, and is also a VPN (WG) gateway and haven't had issues, despite it having two VMs having bridged networks
flowchart TD A[Internet] --> system[host network adapter] system --> bridge{libvirtd-bridged network} bridge --> VM1 bridge --> VM2 bridge --> VM3
(See routed network on hetzner: https://docs.hetzner.com/robot/dedicated-server/ip/additional-ip-adresses/)
My setup was done via Cockpit-GUI.
I guess I have the same: A[Internet] --> system[host network adapter] system --> bridge bridge --> VM1
All three Network interfaces are direct attachments, while one is not used. Perhaps there are commands I can use to help describe my setup better?
virsh domiflist fw01 Interface Type Source Model MAC ----------------------------------------------------------- macvtap0 direct enp3s0 virtio 52:54:00:31:e7:57 macvtap1 direct enp4s0 virtio 52:54:00:14:27:c7 macvtap2 direct enp5s0 virtio 52:54:00:44:83:53 -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1214469 https://bugzilla.suse.com/show_bug.cgi?id=1214469#c8 --- Comment #8 from Samuel Conway <samcon@protonmail.com> --- This seems to be a snapper issue, so we can close this report. -- You are receiving this mail because: You are the assignee for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1214469 https://bugzilla.suse.com/show_bug.cgi?id=1214469#c9 Samuel Conway <samcon@protonmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|NEW |RESOLVED --- Comment #9 from Samuel Conway <samcon@protonmail.com> --- closed -- You are receiving this mail because: You are the assignee for the bug.
participants (1)
-
bugzilla_noreply@suse.com