MeltingIce Blog

Back in Action

Computer Science major from WPI and Software/Systems Engineer at TwitPic. Obsessed with numbers and programming languages.

RubyDrop – An Open-Source Alternative to Dropbox

To start off this post, I just wanted to say: wow. RubyDrop has been getting an insane amount of attention lately and has been the top trending repository on GitHub every since it launched. Thanks to everyone for your interest, and a special thanks to Reddit for the corrections and suggestions!

Now, about the project. RubyDrop is my first-ever Ruby project that I started in order to learn the language.  I’ve been reading the fantastic Programming Ruby 1.9 (which is only $10, by the way) while simultaneously developing RubyDrop, since hands-on learning is by far the best type.  Because of this, RubyDrop is not the best Ruby code.  In fact, some long-time Ruby users would probably say it’s awful.  But that’s ok, because it’s all a part of the learning process.  The source code is on GitHub and is open for forking and modification.

The Inspiration

The inspiration to start this project came to me one day while I was working on a project for TwitPic.  While not related, I realized a few things about Dropbox that I wish I could change.  Those were:

  1. While completely understandable, Dropbox is not free if you want more than 2GB of storage space.  Sure you can refer friends to unlock more space, but that’s a lot of work, especially when most of your friends already use Dropbox.
  2. Dropbox is hosted by a company that you have to trust your data with in order to use.  While I have no reason to suspect Dropbox for anything, it’s always nice to have a self-hosted option.
  3. I’ve been wanting to learn Ruby for a really long time, and now seemed like as good of a time as ever.

The Requirements

Once I decided that I wanted to pursue this idea in order to learn the Ruby language, I made a set of requirements (in my head) that the resulting application should meet.  They are:

  1. In the same spirit as Dropbox, it’s operation should be completely invisible to the end-user. This means, the user should not have to worry about the implementation of the project, but instead should be able to use it simply, and without a learning curve.  Ultimately, this meant that it would have a special folder, like Dropbox has, that it keeps in sync at all times.
  2. It should be fast at syncing files between computers.
  3. It should take few system resources. In other words, it shouldn’t interfere with your overall experience if you decide to play a game or do something else processor-intensive.
  4. It should have the ability to revert files and undo changes.
  5. It should have a TCP server interface as an API to the daemon that will let advanced users, or OS native applications, control the daemon without restarting it.

Possible Solutions

The first solution that I thought of involved rsync, since rsync is commonly used for automated backups.  While the client-side of rsync would be rather simple, I wanted to see if I could implement a solution that avoids running server software as well.  Also, configuring rsync means configuring system config files such as /etc/rsyncd.conf.  While not difficult, I wanted to avoid editing these if at all possible.  Also, rsync doesn’t have the option of reverting changed files because it doesn’t track file change history.

The second idea I had was using SCP to transfer files to the remote server.  This is a slightly better solution that rsync as far as I can tell because it doesn’t require editing system config files and is supported by the excellent Net-SCP Ruby library.  However, again, I wanted to see if I could implement RubyDrop without actually running software on the server, which would reduce the complexity for the end-user and lessen the possible points of failure.  Also, SCP doesn’t support tracking files changes.

The third idea, which is currently implemented, uses Git as the backend for both file tracking and remote syncing.  Since Git is a source-code management tool, it has built-in many of the features that RubyDrop needs.  It tracks changes to both local files and remote files, it supports file change history with the ability to revert changes, and has the ability to both push and pull data to/from a remote server.

One idea that was presented to me by Reddit was rdiff-backup.  The idea behind rdiff-backup is:

rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.

This sounds like an excellent library that has been under development for a long time, so I will definitely be looking into it as well as Net::SCP.

Problems with Using Git

While Git sounds like an ideal solution to this problem at first, there are some downsides to it that must be addressed. Because RubyDrop is such a new project, all of these problems still need to be solved in one way or another:

  1. Since Git tracks file history through commits, and allows you to revert to any revision, the amount of disk space needed by Git will grow quickly as more and more files change.
  2. Git is poor at tracking binary files, especially large binary files.  This is related to problem #1.
  3. Pushing large amounts of data to a remote repository can be unreliable, and can possibly time out.
  4. Git doesn’t implement any way to watch the repository and emit events when changes occur.

Because of these problems, it’s likely that a hybrid solution is needed.  I’m definitely open to suggestions, and would love to hear your ideas.

Share this Post:
  • Twitter
  • Facebook
  • Reddit
  • HackerNews
  • Digg
  • Slashdot
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Google Bookmarks
  • Google Buzz
  • Tumblr
  • Posterous
  • Add to favorites
  • email

31 Responses to RubyDrop – An Open-Source Alternative to Dropbox

  1. Please take a look at https://github.com/apenwarr/bup and http://code.google.com/p/brackup/. They use the rsync-checksumming algorithm to do efficient binary synchronization.

    You could use a bup like process into a git repository. Those repositories could then efficiently sync between each other without a worry for conflicts. Your main directory indexes are the parts that have to merge– and then you can take the strategy that Dropbox used and create two “copies” of the same file, letting the user resolve the conflict.

  2. It is really awesome!

  3. Johan Rydberg says:

    A good alternative to Git would be the Tahoe LAFS file system. It’s a distributed, secure file system.

    See http://tahoe-lafs.org/trac/tahoe-lafs

    It has a REST-API that you can use to implement your dropbox-like functionality.

  4. There is already one alternative which works on windows, linux and macosx, it’s called novell iFolder

    and it’s also under the gplv2 licence, see http://sourceforge.net/projects/simias/

    I don’t know why this isn’t more widespread.

    • @chris, so simias is the server? The annotated screenshots http://ifolder.com/screenshots are quite informative — didn’t find that on my previous attempt to understand iFolder.
      What I’d really need is some recent guides on how to install. All I find is too outdated.

  5. MIchael Hoisie says:

    I’d start with Git for v1. You can worry about the optimizations later. Honestly, if you add a mp3 or video to a Dropbox folder, how often do you really change it? Plus there are tricks to optimize/collapse a git repo.

  6. Pingback: Linkdump: Dubya, Sexy Kinects and Spy vs. Spy « Joyeur

  7. Another one to look at is lsyncd: http://code.google.com/p/lsyncd/

    It’s probably the closest to dropbox syncing you’ll find.

    rdiff-backup is really meant for running on a nightly cronjob kind of setup. It’s not something that would run continuously to keep things in sync.

    I’m not sure why you think you need to edit /etc/rsyncd.conf either. If you’re running rsync over ssh, anything you might need to do is easy with commandline flags to the rsync client. Both machines would need to have sshd running, ssh keys set up, and rsync installed, but that’s probably going to be something you have to do no matter what you build.

    IMO, git is best for managing something like source code where it makes sense to explicitly tell it when to commit, push, pull, etc. instead of for this kind of fully automated setup that you want. That said, “Since Git tracks file history through commits, and allows you to revert to any revision, the amount of disk space needed by Git will grow quickly as more and more files change” might be true on the face of it, but Git is still more efficient at storing versioned content than anything you’re likely to roll yourself.

    I actually use Tahoe for this kind of thing, but it’s not for everyone. Tahoe’s interface is primarily HTTP instead of filesystem, so existing applications that are expecting files on a filesystem won’t know how to talk to it. Eg, I moved my mp3 collection into a personal Tahoe cloud, which lets me access my music from anywhere, but I’ve had to hack my music player software to stream files down from Tahoe instead of reading a directory on disk somewhere.

    For you I’d probably recommend a hybrid approach with rdiff-backup, git, and lsyncd. lsyncd can watch multiple directories and seemlessly keep them in sync across multiple machines (using rsync and ssh behind the scenes so it’s efficient and secure). So have lsync watch your “sync” folder, then set up a nightly cronjob with rdiff-backup to back everything up to another folder that lsyncd also watches. And use Git for anything that needs finer grained versioning (lsyncd will sync your Git repos as well).

  8. Henrik Hodne says:

    Looks like a great project, I’ll probably check it out more closely when I have some extra time. As for the git events, have you checked out git hooks? Look in the .git/hooks directory or githooks(5).

  9. Pingback: RubyDrop – A Self-Hosted Dropbox Clone in Ruby | ChurchCode

  10. I love this idea, I love drop box, but for the reasons you mentioned, i find it limiting. I’d love to be able to run this on my own host.

    I’ve been toying around with the idea of using dropbox or similar services for our church but this would be spot on!

    Keep it going dude – if you nail it, this will rock the socks off lots of people’s world.

    • Thanks man, I’ve been working on it whenever I get free time, which unfortunately isn’t very much right now due to college.

      I’m hoping to improve RubyDrop a lot once I start winter break in 2 weeks or so. It’s definitely been an interesting start to the project, especially since I am still quite a newbie to Ruby.

  11. I was actually looking to do the same thing but using node.js….however, I’ve also been interested in ruby for a while and was considering going that route.

    Interesting suggestions on here for storage (i.e. rsync, lysnc, git, etc). I hadn’t fully decided either. I was thinking of just going with the easiest to get working…maybe not the most efficient, maybe not the best, etc. But once you get it going you can iterate and improve. If you design things well you can abstract the file transfer, sync, revisions, etc…and then implement those abstractions using rsync or git, etc w/o changing everything around).

    I’ll hopefully take a look..maybe fork and collaborate. Keep going!

    • Node.JS would definitely be a cool language to use for a project like this as well.

      I’ve played around with Node.JS a lot, and I really like it, although I still feel like it’s a bit too young of a language. It has the added benefit that it’s really just Javascript, which I’ve been working with for a long time. I had some problems daemonizing processes, but maybe that was just me, who knows.

      Maybe down the line once RubyDrop becomes more mature I’ll look into a Node.JS port of it for fun and call it NodeDrop :)

      I encourage you to collaborate on RubyDrop! If I get anything out of this project, I’ll be happy that at least other people used it to learn Ruby as well and had fun doing so. Cheers!

  12. Pingback: Delicious Bookmarks for December 7th from 15:31 to 17:00 « Lâmôlabs

  13. mmm, tasty – please hurry :D

  14. Looks like a great project, I’ll probably check it out more closely when I have some extra time. As for the git events, have you checked out git hooks? Look in the .git/hooks directory or githooks(5).

  15. Pingback: QuickLinks vom 13. Dezember bis zum 22. Dezember — instant-thinking.de

  16. Pingback: 環境構築 « Bad Tips

  17. Martin Hammerschmied says:

    Where is RubyDrop compared to SparkleShare now? I’ve heard about the latter a couple of months ago. It seems like they have a pretty neat software collection there already. I’m really curious about RubyDrop though. Keep it going!

    • I would say that, currently, RubyDrop is in a very “hacked together” state. I haven’t had time to work on it very much since the first week or so of development. I have some cool plans for it though that I want to implement in the near future.

      Compared to Sparkleshare, RubyDrop is very far behind. RubyDrop also only has about 3 or 4 days of development time behind it while Sparkleshare has been in development for years. Have to start somewhere though! :)

  18. I’ve actually been looking into making my own dropbox clone for a while. I’ve settled on mounting an amazon s3 with s3fs. They give you 5G for free, which is better than dropbox’s 2G.

    Of course, it would be infinitely superior to just pop a few new hard drives in my server and run, well, rubydrop. :)

  19. Keep working, good post! This was the thing… I had to know.

  20. Nice program, but Sycany is getting the most hype right now and it is appropriate. More functions than all other dropbox-style Programs, but open source.

    http://www.syncany.org/

  21. Nice program, but Sycany is getting the most hype right now and it is appropriate. More functions than all other dropbox-style Programs, but open source.

    http://www.syncany.org/

  22. Pingback: Petite revue des solutions libres de synchronisation « Sciunto

  23. Pingback: 2011-08-21 Decode IPSec Traffic / PayPal and Dropbox Alternatives / SuperNAP DC in Vegas / AES Cracking / FPGA Cracking / Another Anon Breach / DARPA Wants Hackers / Sony Digital Binocs / Podcasts we like « Compute Cycle

  24. You are in point of fact a just right webmaster. The website loading speed is incredible. It seems that you are doing any unique trick. Furthermore, The contents are masterwork. you’ve performed a wonderful task on this subject!

  25. Pingback: Dropbox Open Source – GeekLogy

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>