Ten Weeks with Seafile

I started using Seafile over two months ago for all of my self-hosted cloud storage needs. I think enough time has passed, and I figured I should report back on how things have been going so far. Seafile has actually been working quite well for me so far. I’ve had a few snags, but they’re all minor, and they’ve all been pretty easy to work around.

How much data are we storing?

I have about 9 GB worth of files stored in Seafile, and Chris has around 24 GB up there. She’s beating me by a pretty wide margin, but she has her music collection stored up there. I’ll probably think about doing the same thing some day, but my music collection is ancient, mostly ripped from CDs, and is terribly unorganized.

Users on my Seafile server

I’m not entirely sure how Seafile determines the amount of storage space everyone is using. The web interface says we’re using a total of 32 GB between us. Even after manually running the garbage collector a few times, the server’s drive has over 40 GB of data. I’m guessing the extra 8 GB is taken up by our file revision history; most of our libraries are configured with 90 days of history.

My libraries are scattered all over the place, but Chris only has one, so hers are much easier to measure. Her local Seafile library is 31 GB, which is quite a bit smaller compared to what Seafile is reporting. I’m going to hazard another guess here, and I’m going to say that compression and deduplication are saving her some space. I wouldn’t be surprised if she has more than just a few gigabytes worth of duplicate songs and photos.

Are these discrepancies alarming?

Not at all. Most of my Seafile libraries are replicated to my laptop, and everything seems to match up just fine. Accounting for used space in the presence of compression and deduplication is a very hard problem. I’m happy as long as my data is replicating correctly.

How much bandwidth does Seafile use?

I figured my bandwidth usage for August would be pretty typical, and I was hoping to get a screenshot of my Seafile virtual server’s bandwidth graph to show off here. It was going great for the first three weeks or so. It was up at around 4 GB of traffic, and probably on its way to closing out the month with less than 6 GB of bandwidth consumed.

Bandwidth use is a little higher than expected

Then Chris’s new computer arrived, and she finally got around to moving all of her stuff up to the Seafile server. This upload pushed the bandwidth consumption for August up over 30 GB, and totally ruined the graph on me.

I ended up taking a screenshot of September instead. The image above covers September 1 through September 8, and we’ve already used up over 18 GB of data transfer. I’m not sure if Chris is just meaner to the Seafile server than I am or what, but we’ll be going way over the 6 GB I was expecting to see last month.

The data I have so far is pretty inconclusive, and I expect that everybody’s bandwidth use will vary quite a bit anyway. I’ll post some updated bandwidth numbers after I have a few more months to collect data.

Is it faster than Dropbox?

I have fast server hardware on a fast, reliable Internet connection in a data center that is less than 10 ms away—Seafile is significantly faster than Dropbox for me, and usually has no problem maxing out my 35 megabit-per-second FiOS connection.

My 35/35 FiOS SpeedTest results

Symbolic links are broken

I keep my Zsh configuration in a Git repository. There are a few symlinks in there, one of which is a link from my ~/bin directory. This one in particular ended up replicating in a pretty strange way, and I wish I paid more attention to what happened. I somehow ended up with infinitely recursing bin directories that looked something like this:

Near Infinite Recursion

wonko@zaphod:~/.zprezto/modules/persist/files/bin/bin/bin/bin$ cd bin
wonko@zaphod:~/.zprezto/modules/persist/files/bin/bin/bin/bin/bin$ cd bin
wonko@zaphod:~/.zprezto/modules/persist/files/bin/bin/bin/bin/bin/bin$ cd bin
wonko@zaphod:~/.zprezto/modules/persist/files/bin/bin/bin/bin/bin/bin/bin$

I manually cleaned up the receiving end of that sync, and it has been syncing correctly ever since.

I had another directory in there that was a symlink to one of its siblings. Seafile ended up syncing that one like it was two different directories. I’m pretty sure I was able to use a git checkout command to fix that. I should have been taking notes instead of just randomly trying out different ways to fix the problem!

There are replies to bug reports on Github that imply that symlink support will eventually be added to Seafile.

Git index files are often in conflict

I could probably list off a ton of reasons why synchronizing Git repositories between multiple computers is a bad idea. Git can push and pull for a reason. I tend to store lots of config files in Git repositories, though, and I’d like to have them synced up. When I’m on the road, I don’t want to be looking for a change that I forgot to commit, either.

I’m not a very advanced Git user, and I’m not entirely sure what is happening here. If I commit a change to my Zsh configuration, that change will be cleanly synchronized to my laptop, but I will end up with a Seafile conflict entry for the .git/index file.

I’m not exactly certain what is causing this, and it hasn’t actually caused any noticeable problems, but it does worry me a bit. Each copy of the ~/.zprezto repository seems to be identical before I start editing. Seafile should be synchronizing them every time I save a file.

I expect that the index file gets modified as I run various Git commands, but I would expect that my more recent, local copy should be pushed to the other machines. I’m not sure why Seafile even notices a conflict.

I’m the only person editing these files. This may be problematic if multiple people are editing files in a Git repository in the same Seafile library. That would be a bad idea even without this problem, and it would probably be a good way to lose some edits.

Update: One of the Seafile developers, JiaQiang Xu, was nice enough to address some of my concerns in the comments. I will have to do some testing to figure out exactly what is going on here.

Watch out for colons in filenames!

I knew that Seafile doesn’t sync files with colons, question marks, or dollar signs in their names. That didn’t stop me from wasting 20 minutes trying to figure out why a screenshot file with an actual time stamp in its name wasn’t syncing.

I don’t really care that Windows doesn’t support these characters in a filename. I haven’t used Windows in years. I’d really like to see this restriction removed, but it isn’t by any means a show stopper.

Update: JiaQiang Xu says that this restriction is going to be removed.

Seafile encryption and the NSA

I’ve been relatively paranoid about my data for a long time, and the documents that Edward Snowden leaked have brought quite a few people up to my level of paranoia.

I don’t actually know exactly how secure Seafile’s client-side encryption is, and I don’t know if it is even implemented correctly. If you’re paranoid enough, there is definitely a major flaw in Seafile’s encryption implementation.

If you want to access your encrypted data from a web browser, then your password will be sent up to the server and stored there for an hour. If you want to use encryption and the web interface, then you are required to trust the server. You’re also required to send your encryption passphrase up to the server at least once in order to set up an encrypted library.

My Seafile virtual server runs on a piece of hardware that I own. I am not too worried that my password was potentially stored in plain text in my server’s memory for a short time when I set up each of my libraries. If I were leasing a server, virtual or otherwise, I’d be more than a little concerned about this.

You’ll also need to send your encryption password up to the Seafile server if you want to access your file revision history. I find this even more problematic.

Even so, it is still a major step up compared Dropbox’s security.

Update: JiaQiang Xu mentioned that they are working on a new encryption scheme for Seafile that won’t require sending the password up to the server. He also tells me that a new QT based Seafile client is in the works, and this new client will be able to create new encrypted libraries without having to send your passphrase up to the server.

Are you having as much success with Seafile as I am?