Can You Run A NAS In A Virtual Machine?

| Comments

Of course you can. There is absolutely nothing stopping you. My home NAS is running in a virtual machine.

Should you host your NAS as a virtual machine? I don’t know anything about your needs, but I bet you can get away with running a NAS in a VM. It is definitely the correct option for my use case!

Some folks will tell you that running your NAS in virtual machine a terrible idea. I’m going to tell you why they’re wrong, and I’m going to help keep you from making mistakes that would make those people correct!

What a is a NAS?

The acronym stands for Network-Attached Storage. In the old days, We used to call them file servers, but I guess that just wasn’t fancy enough. Anything that shares files on an IP network qualifies as a NAS. In the old days, we even used IPX, but we didn’t use the term “NAS” back then.

Some of Brian's NAS builds

Your cheap Wi-Fi router with a USB flash drive plugged into the back. Your old Windows XP laptop that’s sharing all your movies. My own virtual server running Samba on Linux. My friend Brian’s beefy DIY NAS servers and his EconoNAS boxes. They’re all network-attached storage devices!

What isn’t a NAS?

A NAS doesn’t transcode video. A NAS doesn’t have to be using ZFS. A NAS doesn’t require ECC RAM. A NAS doesn’t host virtual machines. A NAS isn’t an iSCSI target—that’s a SAN!

Running extra services like a video transcoder, iSCSI targets, or virtual machines will make your server more versatile. They aren’t part of the NAS.

But what if I want to run Plex on my NAS?!

That’s fine! A NAS server is just a file server, and a file server is just a computer. You can do whatever you like with your computers.

You want to host VMs on your server? Go for it! Just make sure you have enough RAM and CPU to handle the load of those virtual machines.

A NAS doesn’t need much processor power or a ton of RAM—your Gigabit Ethernet or Infiniband connection will be your bottleneck most of the time. Transcoding video doesn’t require much RAM or disk, but it sure does need a lot of CPU.

This is exactly the sort of use case where virtualization excels. Your NAS isn’t making much use of your processing power, and your Plex transcoder isn’t fully utilizing its disk or RAM. If you put them both on the same box, you’ll make better use of your hardware.

Some of my virtual machines

Why virtualize each service? Why not put the NAS and Plex server on the same machine?

Plex communicates with services on the Internet. That’s enough reason for me to want to keep my file server separate from Plex. I want Plex to be able to see my videos. I don’t want my bank statements or drafts of my upcoming blog posts leaking out to the Internet!

Am I building a NAS that transcodes video, or a transcoding server that serves files?

This is almost a silly question. You might use a truck to haul plywood home from Lowes. You might use a minivan to drive your family to the arcade.

You can pack your family into the pickup truck, but you might only have those goofy little fold-down seats in the back. You can haul plywood in your minivan, but you’re going to have to fold down or remove the seats.

Each can do the job of the other. Your preference will be based on which task you do more often. If you’re hauling six kids around every day, you want the minivan. If you’re running to Home Depot three times a week, you’ll want a truck.

These are two options that can handle two very different jobs. Maybe you don’t have six kids, and you never bring plywood home from Lowes. You don’t need a large vehicle. Maybe you’re like me, and you only need a Miata.

Sharing files to a handful of computers at home doesn’t require much CPU or RAM. It doesn’t require a truck or a minivan. You can build a little Atom or Celeron machine that sips power and doesn’t cost much.

If you need to transcode video on demand, you’ll be looking at entirely different motherboards and processors. You’re going to be looking at trucks or minivans. Are you building a NAS that also transcodes video, or a beefy transcoding machine that happens to serve files?

I don’t know which server is the truck or the minivan. I’m sure you get the idea.

QUESTION: The streaming devices on every TV in my house can play back 4K content using all the common codecs just fine, and buying three or four of those is cheaper than the CPU it would take to transcode. What on Earth is everyone transcoding?!

Everyone says I should run ZFS on my NAS!

I agree with them. Running ZFS on a NAS is a fantastic idea. ZFS computes and stores checksums of all your data. When ZFS reads your data back from the disk, it compares the data to the checksum. If it doesn’t match, it knows your data is corrupt.

If ZFS is able to read that same data from a different disk, it can correct that error. Standard RAID levels won’t even detect that kind of error, so they can’t correct for it.

Ryzen 1600

ZFS isn’t the only file system option with checksums, but it is one of the best and most advanced. The Linux kernel’s device mapper layer now has a module called dm-integrity, and I’m interested in trying it out. It adds a ZFS-like checksum layer to any block device.

I have a feeling that I’ll have to tear down the RAID 10 on my KVM server to set it up, and I’m not excited about putting in that work!

Redundancy is important

Earlier in this post, I said that all you need for a NAS is a USB flash drive. That may qualify as a NAS, but it would be a pretty terrible file server!

Hard drives fail, and they fail often. Backblaze’s data says that 5% to 10% of their hard drives fail every year.

RAID is not a replacement for backups

RAID prevents downtime. If you don’t have hot swap bays in your server, you will have to take your server down to replace a dead hard drive, but at least you don’t have to go through the time and pain of a restore from backup.

RAID won’t save you from a lightning strike. RAID won’t save you when a bad SATA controller mangles all your disks. RAID won’t save you when your file system gets corrupt. RAID won’t save you when you accidentally delete something. RAID won’t save you from malware.

ZFS has some nice features that can mitigate some of these risks. Regular snapshots will probably protect your from malware and accidental file deletions, but not much else.

It is fairly common to have a second drive in an array fail as you are replacing a disk. Rebuilding a RAID requires reading every bit of data from each drive, and it can take quite a few hours for the process to complete. This can be stressful for a drive that is already near the end of its life.

Storage is cheap. Backups are expensive. Have you ever wondered why your IT department wants to limit your storage space and the size of your inbox? Adding a few more disks is cheap. Backing up all your data and sending it off-site every single day is expensive!

Wait a minute! If ZFS is so great, why aren’t you using it on your NAS?!

I set up my NAS virtual machine almost five years ago. My NAS is running Linux, and at that time, ZFS support on Linux wasn’t so great. Ubuntu had only just started shipping native, in-kernel ZFS support at that time. I’d already used the ZFS FUSE file system on another machine, but it didn’t perform all that well.

I am not a fan of FreeBSD, and I have no need for the FreeNAS’s convoluted web interface—I work much faster at the command line. Plenty of people use FreeNAS, and I’m sure I could have run it in a VM.

ZFS combines many of the features of Linux’s separate MD and LVM layers. This makes ZFS powerful and convenient, but ZFS is currently missing an important feature.

You can’t add disks to an existing ZFS zpool. If you want to expand your storage, you have two choices. You can replace every single disk in your zpool, but this can be expensive. You can create an additional zpool, but if you’re using RAID-Z2, you’ll be wasting two more disks’ worth of space on redundancy.

Linux’s MD and LVM layers allow me to easily add additional disks to my RAID 10 or RAID 6 arrays. When I run out of space, I just buy another drive.

ZFS was created by a company that sells expensive, high-end servers. Every time it has been my job to spec out servers in a data center, I almost always filled the chassis with disks. That’s exactly what Sun expected their customers to do.

If you’re spending your own money, there are plenty of good reasons wait until you actually need more storage before buying more drives. Prices decrease over time, and the older a drive gets, the more likely it is to suffer a mechanical failure.

Which device should host my NAS?

I like to consolidate my services as much as possible. Also, I’ve always encouraged people to run their in-home services on hardware that already needs to be powered on all day long.

For a long time, I hosted my NAS and other virtual machines on my arcade cabinet. My NAS needs to be accessible 24 hours a day, and having my arcade cabinet ready to go at a moment’s notice was awesome. Why waste 30 to 60 watts on two different machines that are usually idle?

You probably don’t have an arcade cabinet.

I needed to get my NAS VM closer to my desktop, because CX4 cables for Infiniband aren’t very long!

Does my NAS virtual machine need its own RAID?

I had about four different headings here with about 1,500 words explaining the advantages and disadvantages of various RAID configurations. It was dry, boring, and probably not all that helpful. Instead, I’m just going to tell you what I do, and why I do it.

On my own setup, my virtual machine host has a RAID 10 array with LVM configured on top of it. Some of my machines use LVM block devices on that RAID 10 array as disks, but most of them use disk image files that are stored on an ext4 volume on my RAID 10. I take a performance hit by using image files, but they tend to be more convenient than block devices.

Parts for my KVM server

My virtual machines don’t need any RAID configuration. All the disk redundancy work is handled by the host. The less specific the virtual machines are to the host, the better. It makes it easier to shuffle machines around.

If a drive fails, or I expand my storage, the virtual machines will be completely unaware.

The RAID 10 array on my KVM server is encrypted. If someone pulls the plug and carries the box out of my house, they won’t be able to access any of my virtual machines. With this setup, I don’t have to enter a passphrase into each VM when they boot up. I only have to unlock one encrypted volume when the KVM server boots up.

Is there any reason to set up your RAID in the NAS VM?

Yes. If you’re running ZFS or dm-integrity on the host, and there is a checksum or read error, your NAS virtual machine will never know about it. ZFS may tell you that it corrected an error, or it may tell you that there is a file with a bad checksum, and it can’t fix it.

Which file has the bad checksum? If you are running a set-up like mine, you’ll have to do a lot of work to find out. ZFS would tell me that there was a checksum error in the disk image file of my NAS VM. ZFS on the host doesn’t understand what’s inside that image file, and the NAS VM may have no idea that there was an error.

If the error can be corrected, ZFS or dm-integrity will correct it. This will be the situation most of the time. If you’re starting to lose data to irrecoverable errors, you’re already in big trouble anyway, and it may be time to test your backups.

How do I set up a RAID or redundant zpool in a virtual machine?

Very carefully. Remember when I said there are reasons people will tell you it is a bad idea to run a NAS in a VM? If you set up your RAID inside the VM, and you make a mistake, you will greatly increase your chance of data loss!

You have to make sure each virtual disk is pointed at a different physical disk in your server. You have to make sure that your virtualization software makes your virtual machine’s connection to the disk as direct as possible. It isn’t difficult, but you have to be meticulous. If you goof up, you can make things extremely fragile!

ZFS needs to be as close to the disks as possible. You don’t want the host doing any caching. You don’t want the host absorbing any read or write errors. iXsystems has a post about how to run FreeNAS with ZFS in a virtual machine, and it explains the various pitfalls.

I’m still going to say that you should follow my lead and set up your old-school RAID or ZFS on the host. Unless you have a really good reason to have the VM do this job, I doubt it will be worth the hassle for you.

What does Pat store on his NAS?

I store data that doesn’t comfortably fit on the SSD in [my desktop computer][]. For the most part, that is the RAW files produced by my DSLR and FPV flight footage.

A lot of people use their home NAS to back up the data on their computers. I prefer my backups to be off-site. I use Seafile to meet both my file syncing and backup needs. Seafile is an open-source Dropbox equivalent. It keeps all the data on my desktop and laptop in sync, and 90 days of file history are stored on the server. All the data on the server is encrypted on my end.

If I need to restore a file, it will be on the Seafile server. If the SSD in my desktop fails, I have a recent copy of that data on my laptop, so I don’t have to wait for everything to download. I’m quite pleased with the functionality, redundancy, and cost of this setup.

The only important data on my NAS that isn’t backed up is the GoPro footage of my FPV freestyle flying. The volume is huge, and the older it gets, the less value it has. The footage I collect each month is nearly as large as the collection of photos I’ve taken with my DSLR over a four-year period—200 to 250 GB.

Storage on my NAS is rather inexpensive, so I don’t yet feel the need to delete any of the old footage. That said, I’m extremely unlikely to rummage through flight footage from two years ago. Most of the footage is mundane. Finding 20 seconds of interesting footage among 45 minutes of a day of flying is tedious.

I could be served nearly as well if I just threw a large hard drive into my desktop computer.

Conclusion

Everyone’s needs are different. Sharing files is the least important job handled by my little homelab server. Adding disks to my KVM box to supply storage space for my NAS VM was inexpensive, and it adds less than 10 watts to my overall power consumption.

I don’t see a need to invest the money, electricity, or time into setting up and running a separate piece of hardware to store my large files.

How are you handling your file-storage needs? Do you have a dedicated, beefy NAS machine like the ones my friend Brian Moses builds? Did you win one of the NAS servers he’s been giving away over the years? How are you handling your backups? I’d like to hear about it in the comments, or you can stop by our Discord server to chat with me about it!

Comments