Where I work, we routinely run our databases on XFS on LVM2.
Each database has their database on
/mysql/schemname tree is a LVM2 Logical Volume
mysqlVol on the Volume Group
vg00, which is then formatted
as an XFS filesystem.
You can grow an existing LVM Logical Volume with
lvextend -L+50G /dev/vg00/mysqlVol or similar, and then
You can also create a snapshot with
lvcreate -s -L50G -n SNAPSHOT /dev/vg00/mysqlVol
and if you do this right, it will even be consistent or at least
recoverable, from a database POV. But LVM snapshots are terribly
inefficient, and you might not want to do that on a busy
The size you specified for the LVM snapshot is the amount of backing storage: When there is a logical write to the mysqlVol, LVM intercepts the write, physically reads the old target block, physically writes the old target block into the snapshot backing storage and then resumes the original write. This will do horrible things to your write latency, because the orignal write is stalled until the copy has been made, and I crashed a database at least once with Redo Log overflow while holding and reading a snapshot.
As the backing storage fills up, the snapshot will fail once it
is running out of free space. If you still have free space, it
is possible to extend the backing store using
lvextend -L+50G /dev/vg00/SNAPSHOT with a live snapshot being
Reads to the original mysqlVol can be satisfied the normal way now, as the data we see is always the most recent blocks. Reads from the snapshot will look for the data in the snapshot, and if they find it, will return with the old, snapshotted data. Or, if they do not find it, will look into the mysqlVol instead. In any case, the normal filesystem will show current data, while the snapshot will show old data, and as both volumes diverge, snapshot backing storage will be consumed up to the point where both volumes are completely diverged and the snapshot is as large as the original volume.
Mounting the XFS snapshot volume is a bit tricky: XFS will
refuse to mount the same UUID filesystem twice, and since by
definition the snapshot is a clone of the (past) original
volume, it will of course have the same UUID. So we need to tell
XFS that this is okay:
mount -t ro,nouuid /dev/vg00/SNAPSHOT /mnt
to get it mounted.
Once unmounted again, you can turn the Logical Volume to offline
and throw it away:
lvchange -an /dev/vg00/SNAPSHOT and
lvremove /dev/vg00/SNAPSHOT to get it done.
Mirroring using dm-raid
One method to clone a machine is to convert an existing volume into a RAID1, then split the raid and move one half of the mirror to a new machine.
I made myself a small VM with seven tiny drives to test this: The boot disk is sda, and the drives sdb to sdg are for LVM testing.
The initial setup is like so: We copy the partition table of sda to all play drives. We then create a volume group testvg, to which we add the initial 3 drives only partitions, sdb1, sdc1 and sdd1. We then create a simple concatenation of 2G extents from sdb1, sdbc1 and sdd1.
We can now check what we have. We are looking at the
output to see that we have a 6G LV. Then we check the
output to see that we indeed have sdb1, sdc1 and sdd1 in testvg,
and that 2G of each drive have been used. We can then finally
pvdisplay --map to validate the actual layout.
With this we can introduce the three additional drives, and convert the setup to a mirror:
and the actual conversion.
lvs we can watch the progress of the sync:
Let’s check the disk layout again.
There are two competing implementations of this,
--type raid1. The mirror implementation is very extremely
strongly deprecated, the raid1 implementation is okay, which is
why we used this one. It uses mdraid code internally, and we can
show this using
lvs -a --segments -o+devices
This shows us the visible LV testlv as well as the hidden infrastructure that is being created to build it. The left leg of the RAID 1 is testlv_rimage_0, spread over 3 physical devices.
The right leg is testlv_rimage1, and because the data all fits onto one disk, we get this consolidated into a single 6G segment on a single device, not quite what we want. We also see two meta devices, which hold the metadata and a bitmap that can speed up array synchonisation.
Here we see the asymmetric layout again, at the
Note how the allocation of the rmeta sub-LVs creats the “.99”
Maintaining the RAID
As dm-raid uses md-raid plumbing internally, it has the same controls as md-raid. Among them are also controls that control the sync speed of a logical volume. The lvchange command can set these. For demonstration purposes we are setting these as low as possible, then force a resync of the RAID and check this:
--syncaction repair forces a RAID recovery, the
lvs command shows
the data we need to see to track it.
Splitting the RAID and the VG
We can now split the RAID into two unraided LVs with different names inside the same VG:
Since the raid is now split, the rmeta sub-LVs are gone and the rimage sub-LVs are unwrapped and become the actual LVs (and those .99 numbers in the PFree column are nice and round again).
At this point we can then proceed to split the Volume Group in two, putting splitlv into a new Volume Group splitvg, then export that.
For that, we need to change the testvg to unavailable, then run vgsplit. Because of that, a data LV should always be on a data VG that is different from the Boot VG which would hold the boot LVs. If this is not the case, splitting the data LV would require a boot into a rescue image in order to be able to split the data LV: It is not possible to offline a boot LV without this.
We can see that vgsplit automatically identified the physical drives that make up the splitlv volume, made sure nothing else is on these drives and moves them into a new VG splitvg.
We can now
vgexport that thing, eject the drives and move them
elsewhere. Over there, we can
vgimport things and proceed.
It is now safe to pull the drive.