Adding Another Disk to the RAID 10 on My KVM Server

| Comments

My NAS is a virtual machine hosted on my home KVM server. My KVM server has a RAID 10 array composed of two 4 TB drives. This gives me a measly 4 TB of usable disk space, and I can share that space with other computers in the house. Since it is a Linux MD RAID 10, it is super easy to add a third drive when I run out of space—at least, it should have been! I hit a roadblock in March, but we’ll talk about that later.

My NAS VM was spacious in 2015. I had more than 2 TB of free disk space available. The only thing that was steadily consuming more and more of my disk space was the constantly growing collection of RAW files from my Canon 6D DSLR. I was only taking about 120 GB worth of photos each year, so it seemed like I had quite a way to go.

In 2017, I started flying FPV quadcopters. It is a ton of fun, but I was saving a lot of video. My early quads weren’t powerful enough to comfortably carry a GoPro camera, so all my footage was standard definition video captured on my goggles. Even these small files were enough to really start eating into my free space.

This year, I started recording GoPro footage. A day of flying usually generates at least 15 GB of new video, and when the weather is nice, I fly three or four times a week.

In late February, I noticed that I was down to less than 400 GB of free space. It was time to add another disk. One more 4 TB drive would bring my RAID 10 from 4 TB to 6 TB of usable space. I figured an extra 2 TB would tide me over for nearly two years.

I hit a roadblock in March

I was excited when my third 4 TB disk arrived. I cracked the KVM box open, put the drive in, and got to work adding it to my RAID 10.

Except I couldn’t add it to my RAID 10. Why can’t I grow my RAID 10? I’ve done this countless times in the past! What’s going on here?

1
2
3
4
root@kvm:~# mdadm --add /dev/md1 /dev/sde1 
mdadm: added /dev/sde1
root@kvm:~# mdadm --grow /dev/md1 --raid-devices=3
mdadm: Cannot reshape RAID10 in far-mode

I created my original RAID 10 with far-copies. I chose far-copies, because the layout optimizes the disks for sequential reads. Writes are slower than the default near-copies, but read speeds are closer to the speed of RAID 0.

Unfortunately, you cannot reshape an array that is using far-copies. I had to convert to near-copies, but you can’t do that directly. I knew what I had to do, but it was going to be a pain in the butt—mostly because I had to open the server again!

I had to put another mirrored pair of disks into a fresh MD RAID device, add that new device to original Volume Group, and then use pvmove to migrate the data off of the original pair of disks.

Then I had to reverse the process. I tore down the original RAID 10 and recreated it with near-copies, added the new RAID to the Volume Group, and ran the pvmove in the opposite direction. Once that was done, I was able to remove the temporary disks from the Volume Group and tear down that array.

I have no good documentation of this part of the process. At this point, I am back to where I should have been when I started down this path in March—everything was super easy from this point on!

Adding a disk to a Linux RAID 10 array

The title of this section is a little misleading. There isn’t just a RAID 10 block device that needs resizing. There’s a Physical Volume (PV) that needs to grow, there’s a Logical Volume (LV) that also needs to grow, and there’s a file system on that LV that needs to grow.

Then I need to add more disk space to my NAS VM!

I’ll list the steps here, and then I’ll go into more detail.

  • Partition the new disk
  • Use mdadm to add the new partition to the RAID 10
  • Use mdadm to change the layout from 2 disks to 3 disks
  • Use pvresize to grow the PV
  • Use lvresize to grow the appropriate LV
  • Grow the EXT4 file system on that LV (lvresize will handle this)

At this point, the RAID 10 is roughly 50% larger, and the file system where my virtual disk images live has been appropriately expanded.

I had to go through a few more steps, because I needed to expand the storage of my NAS VM.

So what does this process look like?

I’ve done quite a poor job of saving my terminal output for this blog. I didn’t save the output of my work with fdisk, and I didn’t save the pvresize and lvextend output.

I can’t easily recreate accurate fdisk output, but I can at least show you the commands that I used. I’ll try to do better when I run out of storage next year!

1
2
3
4
5
6
7
8
9
10
11
12
root@kvm:~# mdadm --add /dev/md1 /dev/sde1 
mdadm: added /dev/sde1
root@kvm:~# cat /proc/mdstat 
Personalities : [raid1] [raid10] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] 
md1 : active raid10 sde1[2](S) sdd1[1] sdc1[0]
      3906885632 blocks super 1.2 2 near-copies [2/2] [UU]
      bitmap: 5/30 pages [20KB], 65536KB chunk

md0 : active raid1 sda2[1] sdb2[0]
      243090240 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

At this point, I have added sde1 to my existing RAID 10 device. You can see that sde is followed by (S). That means that sde1 is currently configured as a hot spare. If sdc1 or sdd1 fail, the MD device will use sde1 in place of the problematic device.

This isn’t our intention. We want to store live data on sde1.

1
2
3
4
5
6
7
8
9
10
11
12
13
root@kvm:~# mdadm --grow /dev/md1 --raid-devices=3
root@kvm:~# cat /proc/mdstat 
Personalities : [raid1] [raid10] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] 
md1 : active raid10 sde1[2] sdd1[1] sdc1[0]
      3906885632 blocks super 1.2 512K chunks 2 near-copies [3/3] [UUU]
      [>....................]  reshape =  0.0% (146816/3906885632) finish=443.4min speed=146816K/sec
      bitmap: 8/22 pages [32KB], 131072KB chunk

md0 : active raid1 sda2[1] sdb2[0]
      243090240 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>
root@kvm:~# 

Now I have told mdadm that I want to grow /dev/md1, and I want to grow that array two three disks. The (S) is now gone from sde1, and the array immediately began reshaping.

You’ll notice that the number of blocks in the array is still 3906885632. When you create a fresh array, it is usable immediately. When will our RAID 10 grow?

I believe it happens as soon as every single block from the original devices has been moved to its new home. In the case of going from two disks to three disks, that should be shortly after the 66% mark.

I wasn’t present when the array officially grew, but I was here in the 90% range. By then, it had already grown to 5860328448 blocks.

We’re not done yet. The underlying block device has grown, but LVM and my EXT4 file system don’t know that yet.

1
root@kvm:~# pvresize /dev/mapper/raid10_crypt

I’m sorry I don’t have any output for this commant. raid10_crypt is a LUKS encrypted block device sitting on top of my /dev/md1. /dev/md1 is my RAID 10 array.

You don’t have to tell pvresize how much to resize the PV. It will detect the size of the underlying device, /dev/md1 in this case, and it will resize accordingly.

There are three layers to LVM. The Physical Devices sit on the bottom. Volume Groups are made up of one or more Physical Devices, and those Volume Groups can be sliced up into Logical Volumes. You can think of Logical Volumes as partitions.

You don’t have to resize Volume Groups. They know how big their Physical Volumes are. When you grow the PV, you will immediately see more free space in your VG.

I do have an LV that I need to extend, and it contains a file system that needs to grow as well.

1
root@kvm:~# lvextend /dev/raid10_crypt_vg/kvm -r -L +2000G

kvm is a Logical Volume in my raid10_crypt_vg Volume Group. The kvm LV contains an EXT4 file system. The -r flag tells lvextend to resize the file system. The -L +2000G flag tells lvextend to make the LV about 2 TB larger. That leaves me with 326 GB free in this Logical Volume.

At this point, all the hard work is done!

This looks like a lot of work. How long did this take?

Correcting my far-copies mistake felt like it took forever. It took months! I had to borrow a hard drive. Then I had to help my friend Brian move to a new house. Then I had to buy Brian’s old house. Then I had to move into Brian’s old house. Then I had to deal with four weeks with a useless Internet connection.

For the purposes of this blog post, I’d like to assume that I created this RAID 10 three years ago using the default of near-copies. If we make that assumption, the upgrade was a breeze.

The only bummer is waiting for the RAID 10 to reshape long enough for it to increase in size. If you ignore that, it took less time to run the various mdadm, pvresize, and lvextend commands than it took to install the new hard drive!

FreeNAS and ZFS do this for me! Why would I want to use LVM?

You can’t do this with ZFS. Once a zpool is created, you can’t add new drives to it. You can either replace all of your existing disks with larger drives, or you can create an additional zpool.

Let’s say you’re using RAID-Z2 with six 4 TB disks in your zpool. You will be dedicating two disks worth of data to parity. That’s 8 TB of parity and 16 TB of usable disk space.

If you create a second identical zpool, you’ll be dedicating an additional 8 TB to parity. This will bring you up to 16 TB of parity and 32 TB of usable disk space.

Now lets say I do the equivalent with MD and LVM. I create a RAID 6 array with six 4 TB drives. I’ll end up with 8 TB of parity and 16 TB of usable disk space. So far, this is just like RAID-Z2.

Now I want to upgrade. I can add additional drives to my RAID 6 device. I can add four more disks, and I’ll have 8 TB of parity and 32 TB of usable disk space. If I added the full six disks, I’d be at 8 TB of parity and 40 TB of usable space.

ZFS has no trouble growing upward, but ZFS’s inability to grow outward forces you into a particular upgrade path. Every time I run out of space, I can just add one more disk to my array. ZFS either forces you to plan ahead, or forces you into a bigger investment when you suddenly run out of disk space.

I’m ignoring many of ZFS’s cool features

ZFS is fantastic. It checksums every block on every disk. It is a copy-on-write file system, and that means you get lightweight snapshots. You don’t have to decide how much space to dedicate to a volume, either. That’s handy!

I understand why growing arrays isn’t a feature in ZFS. ZFS was meant to live on expensive servers in huge data centers. I’ve worked in big shops before. It was rare that we upgraded anything. We just spent lots of money to make sure each machine was equipped to function for two or three years, then we replaced the hardware.

I tend to be more frugal at home. I’d like to have ZFS’s checksums, but I also don’t want to guess at my data storage needs for the next three years. Things changed for me, and my calculations all went out the window anyway!

I’d prefer not to buy six disks when I only need two. Saving $600 or more up front is nice. Not having a bunch of unnecessary disks spinning away 24 hours a day in my server is even nicer.

Hard disks fail. The older they get, the more likely they are to fail. Why put all those miles on four extra disks when I can put fresh disks in as I run out of space?

Conclusion

I am getting off topic. I could fill a rather long blog post with ZFS vs. MD/LVM. I will have to put that on my to-do list. It would be a good topic!

I don’t like writing how-to posts, even more so on advanced topics. I also don’t enjoy using too much hand waving in the middle of a post, either. My hope is that I managed to strike a reasonable balance in this post—especially considering that I didn’t save quite enough terminal output for you to follow along on a perfect step-by-step journey!

Just remember. If you want to be able to grow your Linux MD RAID 10 arrays, you have to create them with near-copies. Thankfully, this is the default, so most of you should be fine!

Have you made my mistake before? Are you using LVM at home on your VM host or NAS? Do you think I’m out of my mind? Do you have any questions? Leave a comment, or stop by our Discord server and have a chat!

Comments