Update: This post has gotten pretty obsolete. I’ve written a newer, more thorough cloud-storage comparison post. There may still be some useful information in the post you are currently looking at, but my advice is to skip ahead to the newer post.
In the days before Dropbox, roughly five years ago, just about the only way to share files and store them remotely was to use a file server. When you saved a file, you had to wait for it to be written to your file server. If the file server was outside of your local area network, this might have taken quite a while.
Dropbox made this simple.
Why bother hosting your own Dropbox style service?
I don’t really trust Dropbox, because if you sync a file that another user already has, it would be uploaded instantly. This was a red flag that indicated that you aren’t the only person with the keys to decrypt your data.
Maybe you trust Dropbox and their competitors, we have no way to be certain that our data is being safely encrypted, and we have no way of knowing exactly who has keys that can unlock our data.
Self-hosting might be less expensive
The choices are pretty slim, but there are some virtual private server providers out there that offer backup-oriented servers. These are always very light on memory and CPU, but usually offer quite a bit of disk space. These kinds of plans can sometimes be found for rates a bit better than Dropbox requires you to buy.
I am going to be cheating a little here, though. I have my own server hardware sitting out there in a data center, and there are terabytes of free disk just waiting to be used up.
Hosting your files out in a random data center may not be ideal for everyone. It seems like almost everyone these days has a pretty reliable and speedy Internet connection at home, so you could always build your own NAS and keep your “cloud” storage at home.
What will I be syncing?
If I am going to take the time to set up and maintain my own self-hosted, file-syncing cloud storage service, I am going to need to store enough data up there to make it worth the effort. The home directory on my laptop holds roughly 30 GB of data, and my music collection takes up a similar amount of space.
I’m not certain that I care about storing my music up there. I hardly ever listen to music these days, and Google Music already does a great job of letting me listen to my music on my phone, tablet, or computer.
SparkleShare was the first solution I looked at. One of the things that I find very interesting about this project is that it stores all your data in a Git repository. That means it ought to be very easy to quickly and efficiently replicate your data to multiple servers. It looks like SparkleShare will also let you easily access the version history of a file, and it will even automatically merge changes made to text files from multiple locations.
SparkleShare’s client side encryption doesn’t look ideal, though. File names are not encrypted on the server, and you can only set up a single password, which can never change. This isn’t a deal breaker, but it would be nice to have a more configurable encryption system.
The SparkleShare client for Android is at a very early stage of development. All it can do at this point is download files.
Update: I’m crossing SparkleShare off my list due to this bug report. SparkleShare’s design doesn’t allow it to properly sync directories that contain Git repositories. This makes it completely unusable for my purposes.
ownCloud is a much more mature project than SparkleShare. I remember hearing about it on a SourceTrunk podcast episode back a couple of years ago. If my memory serves me correctly, at that time there was no Dropbox server as a network drive using webdav. This deficiency went away at some point in the last couple of years, though.
ownCloud seems to have a much richer set of features than SparkleShare. ownCloud’s web interface has photo galleries and a built-in music player. You can also connect the ownCloud server to external data sources, like Dropbox, Amazon S3, or Google Drive.
I’m pretty sure that ownCloud’s encryption plugin won’t meet my needs. It looks like ownCloud encrypts your data on the server side. If your ownCloud server is compromised, then all your data is at risk.
BitTorrent Labs just released the first alpha of their BitTorrent Sync application today. For my purposes, BitTorrent Sync has one huge advantage over SparkleShare and ownCloud: it was designed from the ground up with security and encryption in mind. If all you’re really interested in is secure and efficient file synchronization, then BitTorrent Sync might be a good fit.
Unfortunately, you must have complete trust in every machine in computer your are syncing between. With BitTorrent Sync, the data is transferred securely, but the files are stored in the natural, unencrypted state.
BitTorrent Sync uses the cloud in its most literal definition. You don’t need a centralized server. You just need to make sure that at least one of the other devices is also connected to the Internet in order for your data to sync.
Seafile is looking like it might be the best option for my purposes. It has client side encryption, so I don’t have to have complete trust in my host. The Seafile server has a rather spiffy looking web interface that gives you easy access to old versions of files, deleted files, and lets you view the differences between different revisions of the same file.
Earlier today, I set up Seafile server, and I installed the client on my laptop. I asked it to push my entire home directory up to the server. I came back to check on it 12 hours later, and it had seemingly finished indexing about 23 GB of data. I checked on it a few hours later, and it didn’t seem to have made any significant progress.
I found a handful of responses on the Seafile forums implying that Seafile doesn’t work so well with “large” libraries, and it sounds like file count is the primary problem and not file size. My home directory takes up 32 GB and contains 274,568 files.
Pydio reminds me of ownCloud because it has an extremely featureful web interface. It seems to have an add on called Pydio Sync that provides Dropbox style file synchronization. The Pydio Sync page states that it can scale to 20-30k files and tens of gigabytes of data.
It looks like Pydio Sync uses the rsync protocol, so file synchronization should be fast and efficient. Unfortunately, Pydio Sync does not appear to support client-side encryption of your data. It will only protect your data while it is in transit.
Is there a winner?
This is a difficult question to answer. This blog entry started out as a cursory comparison of just two cloud storage solutions. I’ve already added two more since then. There are others that I didn’t feel were usable for me, and others that I just haven’t found yet.
That said, Seafile has client side encryption, and it is the most Dropbox lets you sync multiple clients up with a centralized server. That server has a web interface, and it allows you to share files and folders.
In my mind, Seafile would be the clear winner, if it could just manage to handle hundreds of thousands of files without breaking a sweat. Even with that limitation, I feel that it would be a good direct replacement for Dropbox.
Are you already hosting your own cloud storage? Do you have concerns regarding the security of services like Dropbox?