Experiment: Django Distributed BitTorrent Tracker (DDBTT)

After publishing my previous post about Django I decided to try something more complex. The idea was to create a Django Distributed BitTorrent Tracker.

The BitTorrent protocol is a peer-to-peer file sharing protocol. The main advantage of the BitTorrent protocol over plain HTTP is that when a file is downloaded by multiple people, the downloaders will upload the data to each other. As a result hardware and bandwidth loads are significantly reduced. One of the main components of this protocol is the BitTorrent tracker. The tracker is responsible for coordinating the file distribution. To achieve this the tracker collects data of torrents and their seeders (uploaders) and leechers (downloaders). This data is then shared with BitTorrent clients that connect with the tracker. From there on the clients are on their own when it comes to making decisions.

One of the features of the BitTorrent tracker is the ability to add new torrents at any given time. But what if you want your tracker to keep track of only your torrents? What if you don’t want to become the next Pirate Bay? The answer is quite simple. You would have to disable the tracker’s ability to add external torrents. But by doing so, you too won’t be able to add torrents. This means that you would have to implement an interface to add torrents internally. If your tracker is database-driven this shouldn’t be very hard to do; you only have to add a new entry (assuming you’ve a table of torrents) to your database.

Okay, so your tracker only accepts internal torrents. Lets move on to the next problem. What if there aren’t any seeders available to distribute my file? Again, the answer is simple. You would have to implement HTTP seeding. HTTP seeding allows BitTorrent clients to download torrent pieces from an HTTP source in addition to the peers. Even though HTTP seeding is a client affair, I think the server can play a significant role. After all, the HTTP torrent pieces have to hosted somewhere.

If you combine these two ideas you get something along lines of this:

  • Uploading:
    • File is uploaded to the webserver;
    • Torrent meta data file (.torrent) is generated;
      • Don’t forget to generate a 20 byte sha1 hash for the torrent.
    • Torrent is exposed to the webserver;
  • Downloading:
    • Peer downloads the torrent meta data file;
    • Peer’s BitTorrent client downloads data from peers and webserver;

As you can see the concept is fairly easy. The implementation should be too hard either. The BitTorrent protocol is largely based upon plain HTTP. Clients send data to the tracker with HTTP GET variables and trackers send data back by outputting data (bencoded). Having said that, I didn’t finish my implementation of the Django Distributed BitTorrent Tracker. The reason for this is that I came across some bugs. Normally I would take my time to get rid of them, but currently I’m a bit short on time. Not only am I involved in multiple projects, but I also have to practice for my final examinations.

Even though I didn’t finish the implementation of the concept I’m content with this project. I really enjoyed working on this project. I’ve learned a lot about peer-to-peer (P2P). I was really amazed by the simplicity of the BitTorrent protocol.

If you would like to implement your own BitTorrent tracker or if you just want to know more about it then check out these sites:
Wikipedia – BitTorrent (protocol)
Official BitTorrent specification

Django

Lately I’ve been experimenting with Django. Django is a web framework written in the popular Python language. Python wasn’t exactly written with web development in mind as far as I know. As a result web development with Django is different than with something like PHP or ASP. That isn’t a bad thing, though. Lets say that I’ve never developed web applications as fast as I’m doing now.

The main reason why web development with Django is so quick, is because Django takes care of some annoying jobs. It’s a case of the DRY (Don’t Repeat Yourself) principle:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

Read the rest of this entry »