Where I work, we routinely run our databases on XFS on LVM2.

The setup

Each database has their database on /mysql/schemaname, with the subdirectories /mysql/schemaname/{data,log,tmp}. The entire /mysql/schemname tree is a LVM2 Logical Volume mysqlVol on the Volume Group vg00, which is then formatted as an XFS filesystem.

# pvcreate /dev/nvme*n1
# vgcreate vg00 /dev/nvme*n1
# lvcreate -n mysqlVol -L...G vg00
# mkfs -t xfs /dev/vg00/mysqlVol
# mount -t xfs /dev/vg00/mysqlVol /mysql/schemaname

Basic Ops

You can grow an existing LVM Logical Volume with lvextend -L+50G /dev/vg00/mysqlVol or similar, and then xfs_grow /dev/vg00/myqlVol.

You can also create a snapshot with lvcreate -s -L50G -n SNAPSHOT /dev/vg00/mysqlVol and if you do this right, it will even be consistent or at least recoverable, from a database POV. But LVM snapshots are terribly inefficient, and you might not want to do that on a busy database.

The size you specified for the LVM snapshot is the amount of backing storage: When there is a logical write to the mysqlVol, LVM intercepts the write, physically reads the old target block, physically writes the old target block into the snapshot backing storage and then resumes the original write. This will do horrible things to your write latency, because the orignal write is stalled until the copy has been made, and I crashed a database at least once with Redo Log overflow while holding and reading a snapshot.

As the backing storage fills up, the snapshot will fail once it is running out of free space. If you still have free space, it is possible to extend the backing store using lvextend -L+50G /dev/vg00/SNAPSHOT with a live snapshot being held.

Reads to the original mysqlVol can be satisfied the normal way now, as the data we see is always the most recent blocks. Reads from the snapshot will look for the data in the snapshot, and if they find it, will return with the old, snapshotted data. Or, if they do not find it, will look into the mysqlVol instead. In any case, the normal filesystem will show current data, while the snapshot will show old data, and as both volumes diverge, snapshot backing storage will be consumed up to the point where both volumes are completely diverged and the snapshot is as large as the original volume.

Mounting the XFS snapshot volume is a bit tricky: XFS will refuse to mount the same UUID filesystem twice, and since by definition the snapshot is a clone of the (past) original volume, it will of course have the same UUID. So we need to tell XFS that this is okay: mount -t ro,nouuid /dev/vg00/SNAPSHOT /mnt to get it mounted.

Once unmounted again, you can turn the Logical Volume to offline and throw it away: lvchange -an /dev/vg00/SNAPSHOT and lvremove /dev/vg00/SNAPSHOT to get it done.

Mirroring using dm-raid

One method to clone a machine is to convert an existing volume into a RAID1, then split the raid and move one half of the mirror to a new machine.

I made myself a small VM with seven tiny drives to test this: The boot disk is sda, and the drives sdb to sdg are for LVM testing.

The initial setup is like so: We copy the partition table of sda to all play drives. We then create a volume group testvg, to which we add the initial 3 drives only partitions, sdb1, sdc1 and sdd1. We then create a simple concatenation of 2G extents from sdb1, sdbc1 and sdd1.

# for i in sdb sdc sdd sde sdf sdg
> do
>   sfdisk -d /dev/sda | sfdisk /dev/$i
> done
...
# pvcreate /dev/sd{b,c,d,e,f,g}
# vgcreate testvg /dev/sd{b,c,d}
# lvcreate -n testlv -L2G testvg
# lvextend -L+2G /dev/testvg/testlv /dev/sdc1
# lvextend -L+2G /dev/testvg/testlv /dev/sdd1

We can now check what we have. We are looking at the lvs output to see that we have a 6G LV. Then we check the pvs output to see that we indeed have sdb1, sdc1 and sdd1 in testvg, and that 2G of each drive have been used. We can then finally proceed to pvdisplay --map to validate the actual layout.

# lvs
  LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  testlv testvg -wi-a----- 6.00g
# pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sdb1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sdc1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sdd1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sde1         lvm2 ---  <20.00g <20.00g
  /dev/sdf1         lvm2 ---  <20.00g <20.00g
  /dev/sdg1         lvm2 ---  <20.00g <20.00g
# pvdisplay --map
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               testvg
...
  --- Physical Segments ---
  Physical extent 0 to 511:
    Logical volume      /dev/testvg/testlv
    Logical extents     0 to 511
...
  --- Physical volume ---
  PV Name               /dev/sdc1
  VG Name               testvg
...
  --- Physical Segments ---
  Physical extent 0 to 511:
    Logical volume      /dev/testvg/testlv
    Logical extents     512 to 1023
...
  --- Physical volume ---
  PV Name               /dev/sdd1
  VG Name               testvg
...
  --- Physical Segments ---
  Physical extent 0 to 511:
    Logical volume      /dev/testvg/testlv
    Logical extents     1024 to 1535
...

With this we can introduce the three additional drives, and convert the setup to a mirror:

root@ubuntu:~# vgextend testvg /dev/sd{e,f,g}1
  Volume group "testvg" successfully extended
root@ubuntu:~# pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sdb1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sdc1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sdd1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sde1  testvg lvm2 a--  <20.00g <20.00g
  /dev/sdf1  testvg lvm2 a--  <20.00g <20.00g
  /dev/sdg1  testvg lvm2 a--  <20.00g <20.00g

and the actual conversion. First, using lvs we can watch the progress of the sync:

root@ubuntu:~# lvconvert --type raid1 -m1 /dev/testvg/testlv
Are you sure you want to convert linear LV testvg/testlv to raid1 with 2 images enhancing resilience? [y/n]: y
  Logical volume testvg/testlv successfully converted.
root@ubuntu:~# lvs
  LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  testlv testvg rwi-a-r--- 6.00g                                    6.25
root@ubuntu:~# lvs
  LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  testlv testvg rwi-a-r--- 6.00g                                    15.92
root@ubuntu:~# lvs
  LV     VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  testlv testvg rwi-a-r--- 6.00g                                    100.00

Let’s check the disk layout again.

There are two competing implementations of this, --type mirror and --type raid1. The mirror implementation is very extremely strongly deprecated, the raid1 implementation is okay, which is why we used this one. It uses mdraid code internally, and we can show this using lvs -a --segments -o+devices

# lvs -a --segments -o+devices
  LV                VG     Attr       #Str Type   SSize Devices
  testlv            testvg rwi-a-r---    2 raid1  6.00g testlv_rimage_0(0),testlv_rimage_1(0)
  [testlv_rimage_0] testvg iwi-aor---    1 linear 2.00g /dev/sdb1(0)
  [testlv_rimage_0] testvg iwi-aor---    1 linear 2.00g /dev/sdc1(0)
  [testlv_rimage_0] testvg iwi-aor---    1 linear 2.00g /dev/sdd1(0)
  [testlv_rimage_1] testvg iwi-aor---    1 linear 6.00g /dev/sde1(1)
  [testlv_rmeta_0]  testvg ewi-aor---    1 linear 4.00m /dev/sdb1(512)
  [testlv_rmeta_1]  testvg ewi-aor---    1 linear 4.00m /dev/sde1(0)

This shows us the visible LV testlv as well as the hidden infrastructure that is being created to build it. The left leg of the RAID 1 is testlv_rimage_0, spread over 3 physical devices.

The right leg is testlv_rimage1, and because the data all fits onto one disk, we get this consolidated into a single 6G segment on a single device, not quite what we want. We also see two meta devices, which hold the metadata and a bitmap that can speed up array synchonisation.

Here we see the asymmetric layout again, at the pvs level. Note how the allocation of the rmeta sub-LVs creats the “.99” free sizes.

root@ubuntu:~# pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sdb1  testvg lvm2 a--  <20.00g  17.99g
  /dev/sdc1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sdd1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sde1  testvg lvm2 a--  <20.00g  13.99g
  /dev/sdf1  testvg lvm2 a--  <20.00g <20.00g
  /dev/sdg1  testvg lvm2 a--  <20.00g <20.00g
# pvdisplay --map
  --- Physical volume ---
  PV Name               /dev/sde1
  VG Name               testvg
...
  --- Physical Segments ---
  Physical extent 0 to 0:
    Logical volume      /dev/testvg/testlv_rmeta_1
    Logical extents     0 to 0
  Physical extent 1 to 1536:
    Logical volume      /dev/testvg/testlv_rimage_1
    Logical extents     0 to 1535
...

Maintaining the RAID

As dm-raid uses md-raid plumbing internally, it has the same controls as md-raid. Among them are also controls that control the sync speed of a logical volume. The lvchange command can set these. For demonstration purposes we are setting these as low as possible, then force a resync of the RAID and check this:

# lvchange /dev/testvg/testlv --minrecoveryrate 1k --maxrecoveryrate 100k
  Logical volume testvg/testlv changed.
# lvchange --syncaction repair /dev/testvg/testlv
# lvs -o+raid_min_recovery_rate,raid_max_recovery_rate,raid_mismatch_count,raid_sync_action
  LV                VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert MinSync MaxSync Mismatches SyncAction
  testlv            testvg rwi-a-r--- 6.00g                                    0.19                   1     100          0 repair

The --syncaction repair forces a RAID recovery, the lvs command shows the data we need to see to track it.

Splitting the RAID and the VG

We can now split the RAID into two unraided LVs with different names inside the same VG:

root@ubuntu:~# lvconvert --splitmirrors 1 -n splitlv /dev/testvg/testlv
Are you sure you want to split raid1 LV testvg/testlv losing all resilience? [y/n]: y
root@ubuntu:~# lvs
  LV      VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  splitlv testvg -wi-a----- 6.00g
  testlv  testvg -wi-a----- 6.00g
root@ubuntu:~# pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sdb1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sdc1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sdd1  testvg lvm2 a--  <20.00g <18.00g
  /dev/sde1  testvg lvm2 a--  <20.00g <14.00g
  /dev/sdf1  testvg lvm2 a--  <20.00g <20.00g
  /dev/sdg1  testvg lvm2 a--  <20.00g <20.00g

Since the raid is now split, the rmeta sub-LVs are gone and the rimage sub-LVs are unwrapped and become the actual LVs (and those .99 numbers in the PFree column are nice and round again).

At this point we can then proceed to split the Volume Group in two, putting splitlv into a new Volume Group splitvg, then export that.

For that, we need to change the testvg to unavailable, then run vgsplit. Because of that, a data LV should always be on a data VG that is different from the Boot VG which would hold the boot LVs. If this is not the case, splitting the data LV would require a boot into a rescue image in order to be able to split the data LV: It is not possible to offline a boot LV without this.

The outcome:

root@ubuntu:~# vgchange -an testvg
  0 logical volume(s) in volume group "testvg" now active
root@ubuntu:~# vgsplit -n splitlv testvg splitvg
  New volume group "splitvg" successfully split from "testvg"
root@ubuntu:~# lvs
  LV      VG      Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  splitlv splitvg -wi------- 6.00g
  testlv  testvg  -wi------- 6.00g
root@ubuntu:~# pvs
  PV         VG      Fmt  Attr PSize   PFree
  /dev/sdb1  testvg  lvm2 a--  <20.00g <18.00g
  /dev/sdc1  testvg  lvm2 a--  <20.00g <18.00g
  /dev/sdd1  testvg  lvm2 a--  <20.00g <18.00g
  /dev/sde1  splitvg lvm2 a--  <20.00g <14.00g
  /dev/sdf1  testvg  lvm2 a--  <20.00g <20.00g
  /dev/sdg1  testvg  lvm2 a--  <20.00g <20.00g

We can see that vgsplit automatically identified the physical drives that make up the splitlv volume, made sure nothing else is on these drives and moves them into a new VG splitvg.

We can now vgexport that thing, eject the drives and move them elsewhere. Over there, we can vgimport things and proceed.

root@ubuntu:~# vgexport splitvg
  Volume group "splitvg" successfully exported

It is now safe to pull the drive.