Self-Hosted Cloud Storage with Tailscale and a Raspberry Pi: Six Months Later

| Comments

I’ve been using Seafile for my cloud storage and sync needs for more than eight years. I spent quite a few of those years hosting Seafile on my own colocated server, then I outsourced the hosting to a third party, until six months ago when I brought the operation back in house.

I’m not new at this. As expected, I didn’t have any real problems. Things are certainly built on a more fragile foundation this time, but reliability has still been great.

Eight years ago, my Seafile server lived in a datacenter in downtown Dallas with redundant links to the Internet. I don’t remember how fast those connections were, but they were faster than the gigabit Ethernet adapter in my old 1U servers. My server was built like a tank, had redundant power supplies, and my data lived on a small RAID 10 array.

My new Seafile server is a Raspberry Pi 4 with a single 14 TB USB hard drive, and it lives on my friend Brian’s home network. He has a symmetric gigabit fiber connection from Frontier. I have the same ISP, but I have a 200-megabit symmetric fiber link.

There’s only been one unexplained hiccup

Sometime in my new Seafile server’s first month of service, it completely disappeared. I couldn’t ping it. The Tailscale admin interface said it hadn’t seen it check in since the night before. Brian couldn’t ping it, but that’s to be expected, because just about everything on my Seafile server is blocked on the local interface. Including ICMP packets. The only way in is via Tailscale.

Brian power-cycled it for me, and everything came right back up. It hasn’t happened again since.

In an effort to keep my Pi’s microSD card going, I have disabled just about everything that writes to the root file system. This includes disabling just about every ounce of logging, so even if there would normally be a trail to follow, I wouldn’t have anything to look at.

I have to manually restart Seafile when there’s a power outage

My Raspberry Pi isn’t plugged into a UPS at Brian’s house. If the power blips, my Pi reboots.

The root file system isn’t encrypted, so Linux boots back up without a problem, and it immediately connects to my Tailscale network. The 14 TB external hard drive is encrypted, and it needs me to enter a passphrase to unlock it. If there’s a power outage, I have to ssh in and run a script that mounts the encrypted file system and starts Seafile.

This should happen less often now, because Brian invested in a Tesla solar and Powerwall setup. If there’s another outage any time soon, I will be quite surprised!

Why am I hosting my own cloud storage and file-sync service?

First, there’s the problem that I’m running Linux. Google Drive doesn’t have an official sync client, and Dropbox has been doing goofy things with their Linux client.

Then there’s the paranoia factor. In almost every IT department I’ve ever worked in, I have had the ability to read your email. I’ve never wanted to, and I always thought it was creepy when management wanted to check an employee’s email history. In every one of those IT departments, there has always been at last once person that was EXCITED to tell you that they can read your email. They thought it was awesome.

Last time I checked, Dropbox has the ability to decrypt your data. I have no idea how Google Drive works. These are big companies, and my brain immediately imagines the clones of my old coworkers that are excited about being able to poke around in our data. If you saw the glee in their eyes, you wouldn’t want them nosing around in your files.

Then there’s cost.

Let’s talk about cloud storage pricing!

I’m currently up at 4.4 TB of data on my Seafile server. That includes my data, my wife’s data, and some episodes of The Creativity Podcast. Well over 3 TB of that data is my own.

Google’s largest storage plan is 2 TB for $99.99 per year. I don’t think they’ll let you stack two plans to get to 2 TB, but if they did, I guess it would be $200 per year.

Dropbox’s individual plans are 2 TB for $119 per year or 3 TB for $199 per year. I don’t fit into either of these plans, but at least I am close!

I can move up to a Dropbox business plan, but the minimum number of users is three. That puts it at 5 TB for $450 per year, or you can pay $720 for unlimited storage.

How much did I pay for my Raspberry Pi and 14 TB hard drive?

The 14 TB USB 3 hard drive cost me $230, and a 2 GB Raspberry Pi kit cost me $54. That’s less than $300.

NOTE: I cheated a bit here! I had a 4 GB Raspberry Pi 4 here as part of my Pi-KVM kit. The whole Pi-KVM setup only uses about 200 megabytes of RAM. Seafile fits quite nicely in 2 GB of RAM, but my Seafile Pi is up and running 24/7, and it is located off-site. I figured I may as well put the 4 GB Pi out there, since there’s a chance I might decide to host something else on there!

Addendum to the note: I took a peek, and my Seafile server was using 1.8 GB of RAM after about six weeks of uptime. I restarted Seafile a few days ago, and it is sitting at around 300 MB of RAM. There must be a leak of some sort. If you’re on a small Pi, you might want to schedule Seafile to restart every once in a while!

I’ve been using Google’s storage pricing for arithmetic that justifies my choices because they’re a little cheaper than Dropbox. I am aware that Google won’t actually let me stack a pair of 2 TB plans onto my account, and I’m mostly ignoring the fact that I am well on my to needing more than two plans’ worth of storage.

I am six months into my experiment, and the gamble is well on its way to paying off. Not having to pay Google has saved me $100, so I am more than 1/3 of the way to paying off my hardware.

Comparing self-hosted to Google Drive or Dropbox is difficult

If you sign up for Dropbox, you don’t have to do any real work yourself, and that is awesome. You just install their client, and everything starts syncing. It is absolutely fantastic, and this has a HUGE value!

Then you read stories about people getting locked out of the Google accounts. Sometimes it is an absolute nightmare getting things straightened out. I don’t know how to measure the risk of losing my data to something like this, but I’d imagine it is infinitesimally small.

I am quite confident that having to fight to get my Google account turned back on even once would feel like it cost me hundreds of dollars of time, effort, sanity, and frustration.

We hope that Google and Dropbox are doing a good job replicating our data, but it is pretty opaque to us. We have no idea what is really going on up there.

I know for a fact that my Seafile server has no redundancy. Even so, that server is an integral part of my backup plans. Seafile is configured to keep all my data for 90 days. If I save a jpeg 20 times today, each version will be available on the server—assuming I’m not saving faster than the files can sync!

I know that safety net can disappear instantly. The most important data on my Seafile server is synced to my desktop, my NAS, and my laptop. If the Seafile server disappeared right now, I would still have three copies of that data. The NAS even has a few snapshots.

I don’t have three copies of all my data outside the Seafile server. The video I regularly record is just too big to fit on my desktop’s SSDs or even my laptop’s second hard drive. I sync the current year’s video files to my laptop, but the previous several years just won’t fit. I’m keeping it simple here, but this paragraph could easily be turned into a 1,500-word blog post.

I might be giving up on RAID

Maybe. Sort of. What I’m really going to be giving up on is the centralized file server. I’ve been slowly working toward this for at least the last eight years, but the tiny SSDs we had back then were holding me back, and I really did need bigger volumes than a single hard drive could give me, so I needed to build RAID 6 or RAID 10 arrays.

Things are different now. I can easily fit a year’s worth of the content I generate on a single SSD. My laptop is pretty beefy, so it has room for an NVMe drive along with a 2.5” hard drive.

There are four hard drives in my server. I bought them a long time ago, so they’re small, but they cost $150 each. Today I could shuck two 10 TB drives to stick in my server and desktop, and grab a 4 TB or 5 TB 2.5” drive for my laptop. Then I could sync every bit of my data to three different drives on three completely different machines in my house, and I’d still have a fourth copy with full history on a Raspberry Pi off-site.

I enjoy the idea of replicating my data almost instantly among a redundant array of inexpensive computers.

It is nice that I don’t have to move entirely in one direction. I’m straddling the fence between centralized and decentralized storage today, but the drives will start failing as my NAS ages. When they do, I’ll likely just find myself accidentally sitting fully on the other side of this fence!

I’m using the buddy system for my off-site backup and storage, and so should you!

Every good backup plan includes backing up your data at a second location. More locations would be even better, but I’ll settle for just the one.

There’s a copy of my data on my NAS, desktop, and laptop. If I drop the laptop, I won’t lose anything, but if the house burns down, I’d be in big trouble! That’s why my Seafile server is hosted on a Raspberry Pi at Brian Moses’s house.

Uptime isn’t critical. If the power went out at Brian’s house for a few days, that would be OK. If either of us were having issues with our Internet connections, that would be fine. I don’t need to spend extra to host my Seafile service in a real datacenter with redundant connections to the Internet and on-site power generation.

My little Pi server is sipping about as much power as a 100-watt-equivalent LED light bulb, so I’m not much of an imposition. I’m also more than willing to return the favor.

You should think about finding a buddy to swap Raspberry Pis with, but it should definitely be someone you trust not to exploit your Internet connection for nefarious purposes!

Having cloud file syncing is living in the future

In the old days, if you wanted to share files, you would use a centralized file server. Everyone working on a project would map a share on that NAS, and they’d access the files on that remote machine. If that server lived on the other side of the world, it might feel quite slow working with the files. It might only take a few extra seconds to open a document or spreadsheet, but working with a remote database might be quite slow.

When using a sync service, every time a file is changed, that change is pushed to your local machine. When I am editing video files for The Creativity Podcast, they are already on my local SSD, so I am always editing a local file. All the video files are on my desktop and laptop, so I can work on them anywhere. Even if I’m on a slow 3G cellular connection.

Most of the work I do, like this blog post, is stored in text files, and I commit those files to Git repositories. The trouble with this is that I have to remember to commit my changes. Sometimes, those changes aren’t really ready to be committed and pushed to the server.

If I forget to push my changes, and I walk out the door with my laptop, it can be challenging to continue my work. Tailscale will let me easily sneak back in to fix this mistake, but what if I don’t have Internet access on my laptop?

This used to be a pretty common scenario, but I’m rarely completely without an Internet connection. With Seafile, I don’t have to worry. My laptop is up and running right now. As long as it takes me at least 30 seconds or so to walk away from my desk and pack up my laptop, this blog post I’m working on right now will automatically be synced to my laptop. I can ride to the park, open my laptop, and I won’t have to wait to work.

I couldn’t have done it without Tailscale

I realize that I’m repeating a lot of what I already said six months ago. I’m trying to emphasize the most important bits while adding as much new information as I can. One of those important bits is Tailscale.

I stopped hosting my own Seafile server because I was sick of rushing to keep my software updated. If a security flaw was patched in Nginx or Seafile, I had to rush as quickly as I could to get my server updated. It was sitting out there facing the entire Internet. Anyone could be poking at it.

I was already using Tailscale for a few months before I decided to host my own Seafile server again. I knew I wasn’t going to put the new server on the Internet. I knew it was going to only be accessible on my Tailscale network.

Tailscale is a zero-config mesh VPN. You install the Tailscale client on two or more machines, log into your Tailscale account, and all those machines can talk directly to each other over point-to-point Wireguard VPN connections. Tailscale is ridiculously easy to set up, it is reliable, and the pricing is fantastic for us home users.

My Tailscale machines can talk to my Raspberry Pi no matter where they are located. My Seafile server is at Brian’s house in Texas, while I could be on hotel WiFi in New York with my phone connected to T-Mobile 5G. All three machines can ping each other directly.

Tailscale also lets you share machines with other Tailscale users. My wife has her own Tailscale network that includes her laptop, desktop, and phone. I’ve shared my Seafile server and our Home Assistant server with her. She can sync all her files, and she can check the thermostat when she’s away from home. How cool is that?!

I’ve also shared my Seafile server with my Creativity Podcast co-host, because neither of us have enough spare room on our Google Drive accounts to hold much more than a single episode of the show. I just export my work, it lands in our shared library, and it starts syncing right away.

Conclusion

Should you be hosting your own cloud storage and cloud sync service? Maybe. Especially if you have to store and sync more than 2 TB of data. Don’t forget that your time is valuable, and setting up a Pi and keeping all your software up to date will add up to at least several hours. Maybe you’ll find that tedious, maybe you’ll find it enjoyable.

I wholeheartedly believe hosting my own Seafile server on a Pi using Tailscale was the right choice for me. My Pi’s hard drive is encrypted. Seafile encrypts every block on the client side. My network traffic is encrypted with Seafile. I know my colocation provider has no interest in breaking into my stuff.

I am saving money. I’ll be saving more money as my storage needs keep growing. I’m pleased with my increased level of privacy.

What do you think? Did I make the right choice by hosting my own cloud storage and sync? Should I be spending $450 per year on Dropbox instead, or did I make the right choice spending about $280 on this hardware? Let me know in the comments, or stop by the *Butter, What?! Discord server to chat with me about it!

Comments