Posts

Showing posts from January, 2010

Looking at the original BitTorrent client

I wanted to learn more about how the original BitTorrent client implemented "choking", so I downloaded the source to the original Python-based BitTorrent client from SourceForge . I was impressed, as ever, by the brevity and clarity of Python. The original BitTorrent client sources are very easy to read. (Especially now that I understand the problem domain pretty well.) The code has a nice OO design, with a clear separation of responsibilities. I was also impressed by the extensive use of unit tests. About half the code in the BitTorrent source is unit tests of the different classes, to make sure that the program behaves correctly. This is a really good idea when developing a peer-to-peer application. The synthetic tests let you test things like your end-of-torrent logic immediately, rather than by having to find an active torrent swarm and wait for a real-world torrent to complete. When I get a chance I'm going to refactor the Taipei-Torrent code into smaller classes and

Some details on Xbox Project Natal

From Scientific American: Binary Body Double: Microsoft Reveals the Science Behind Project Natal Short summary: They used machine-learning algorithms and lots of training data to come up with an algorithm that converts moving images into moving joint skeletons. Given the huge variation in human body shapes and clothing, it will be interesting to see how well this performs in practice. I bet people will have a lot of fun aiming the Natal camera at random moving object to see what happens. I can already imagine the split-screen YouTube videos we'll see of Natal recognizing pets, prerecorded videos of people, puppets and marionettes. Oh, and of course, people will point Natal back at the TV screen and see if the video game character can control itself. Great fun!

800 lines of code for a bencode serializer?!

Wow, my Taipei-Torrent post made it to Hacker News and reddit.com/r/programming. I'm thrilled! One comment on Hacker News was "800 lines for a bencode serializer?! It only took me 50 lines of Smalltalk!" That was a good comment! I replied: A go bencode serialization library could be smaller if it just serialized the bencode data to and from a generic dictionary container. I think another go BitTorrent client, gobit, takes this approach. But it's convenient to be able to serialize arbitrary application types. That makes the bencode library larger because of the need to use go's reflection APIs to analyze and access the application types. Go's reflection APIs are pretty verbose, which is why the line count is so high. To make things even more complicated, the BitTorrent protocol has a funny "compute-the-sha1-of-the-info-dictionary" requirement that forces BitTorrent clients to parse that particular part of the BitTorrent protocol using a generic parser

A Taipei-Torrent postmortem: Writing a BitTorrent client in Go

This is a BitTorrent client. There are many like it, but this one is mine. -- the BitTorrent Implementer's Creed For fun I've started writing a command-line BitTorrent client in Google's go programming language. The program, Taipei-Torrent , is about 70% done. It can successfully download a torrent, but there are still lots of edge cases to implement. Go routines and channels are a good base for writing multithreaded network code. My design uses goroutines as follows: a single main goroutine contains most of the BitTorrent client logic. each BitTorrent peer is serviced by two goroutines: one to read data from the peer, the other to write data to the peer. a goroutine is used to communicate with the tracker some "time.Tick" goroutines are used to wake the main goroutine up periodically to perform housekeeping duties. All the complicated data structures are owned by the main goroutine. The other goroutines just perform potentially blocking network I/O using channel

Objects vs. the L2 Cache

An excellent little performance optimization presentation that shows how important memory layout is for today's processors: http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf The beginning of the talk makes the observation that since C++ was started in 1979 the cost of accessing uncached main memory has ballooned from 1 cycle to 400 cycles. The bulk of the presentation shows the optimization of a graphics hierarchy library, where switching from a traditional OO design to a structure-of-arrays design makes the code run 3 times faster. (Because of better cache access patterns.)