Freeze/snapshot bug - [WAS: SOLVED - 2 freeze/snapshot questions]
Further testing has shown that with the 2.4.18-231 kernel and xfs filesystem layered over LVM, the freeze/snapshot feature is unreliable. I have read about a VFS-lock kernel patch that has apparently been around for many months, maybe longer. Does anyone know if this has been incorporated into the official kernel, or into the -aa kernels that are the base for SuSE kernels? Regardless how do I report the below as a bug report in hopes that it gets fixed eventually. (I have also sent an e-mail to the XFS support list, so maybe it is pointless to do with SuSE as well??) And yes, I know XFS is considered experimental by SuSE, but I assume that feedback like this is what will eventually get it more reliable. ====> My simple tests I just tried doing some snapshots under a couple of different i/o loads, and I found them unreliable with simultaneous write activity. I was using the latest SuSE kernel 2.4.18-231 on a Uniprocessor AMD machine. The LVM Volume Group (VG) and XFS filesystem (/data) was created at install time via the standard SuSE 8.0 install GUI. For all cases I was using a simple script to invoke the snapshot process and I was attempting to create a 2.5 Gig snapshot volume: xfs_freeze -f /data lvcreate --snapshot -L 2500m --name Data_snap /dev/VG/Data xfs_freeze -u /data lvremove -f /dev/VG/Data_snap I was invoking the above script by hand, so there were several seconds minimum between each iteration. 1) With zero i/o, I did 10 snapshots with no lockups. 2) With read-only i/o, I also did 10 snapshots with no lockups. (i..e dd if=/data/largefile of=/dev/null bs=64k) 3) With read/write i/o, I had a lockup on my very first snapshot attempt. (i..e dd if=/data/largefile of=/data/junk bs=64k) The lockup occured on the lvcreate step. A kill -9 of the lvcreate process had no effect. Issuing a xfs_freeze -u /data from a different ssh login, cleared the lockup. The whole purpose of the xfs_freeze utility is to allow the snapshot process to be guaranteed of taking a valid snapshot. Taking a snapshot of a journalled filesytem in flight can occasionally capture a corrupted filesystem image. I have also read about a VFS-lock kernel feature/patch that may or may not be part of the SuSE kernel which guarantees that the filesystem is quiescent without the use of xfs_freeze. If this feature is present in SuSE kernels, the appropriate fix may simply be to turn xfs_freeze into a dummy function that simply echos "xfs_freeze is not required on SuSE kernels." ==== Greg Freemyer Internet Engineer Deployment and Integration Specialist Compaq ASE - Tru64 v4, v5 Compaq Master ASE - SAN Architect The Norcross Group www.NorcrossGroup.com
If anyone cares:
The script I was writing/executing to test this was on the xfs filesystem I was trying to snapshot.
I'm not sure why this was causing a problem, but moving it to a different filesystem that is not part of LVM has made it work reliably.
Probably the problem was a direct conflict between xfs_freeze and trying to run a script from the frozen filesystem.
Greg
Any LVM or XFS experts out there,
I'm testing LVM snapshots under xfs for the first time.
I'm using LVM from the SuSE 8.0 release and a updated SuSE kernel that was released at the end of July.
k_deflt-2.4.18-231.i386.rpm
I tried a simple lvcreate --snapshot -L 25m --name config_snap /dev/VG1/config lvscan lvremove /dev/VG1/config_snap
and it worked fine. (I only tried it once.)
participants (1)
-
Greg Freemyer