Scarling

I recently spent bits of my free time porting Starling to scala, and thought I’d post the story here in case anyone else finds it enlightening or interesting.

Starling is a simple, reliable message-queue server written in ruby. It uses MemCache protocol, so almost any language can already speak to it. You can set up multiple starling servers, and just like a memcache pool, the servers don’t need to know anything about each other. If clients pick a server at random for each operation, it appears to be a single loosely-ordered queue, with each server holding its own part. Starling is used extensively at Twitter.

I’ve been tinkering with scala since December, and have been building a config/logging library in my spare time. It was easy to pick up since I’ve been using java and python for many years, and scala seems to nicely synthesize the styles of those two languages. I felt like I was ready to try a meaningful project, and Starling is potentially the kind of project that could gain a lot from running on the JVM. At a mere 1000 lines of ruby, it was also a nice managable size.

First chunk

I’m a bottom-up coder, so I started with the PersistentQueue class. It’s the guts of Starling: a FIFO queue backed by a journal, and the means to rotate and replay that journal when necessary. For this class, I was able to do nearly a direct copy, just converting ruby syntax and library calls into scala ones. Only a few places tripped me up:

Scala, like java, has no equivalent of the “pack” function that’s in all of the perl/python/ruby languages. I actually had to write a 10-line class to write a little-endian int into a stream, and another 10-line class to read one. Ack! That wasn’t an auspicious start, and gave me pause. Everything that makes people run from java to higher-level languages was there: You’re given only the most basic, fundamental tools to do I/O, and serializing data is treated like some strange, rare operation.

Aside from that, it went smoothly, though. The payoff was when I downloaded a troublesome 500MB journal file from a running starling server and ran it through the scala journal replayer. Starling would take about a half hour to process the log (and frequently crash the ruby interpreter). “Scarling” (as I call my scala version) was able to process the log in 23 seconds. Exciting!

Next up was QueueCollection, which is what it sounds like: a collection of all the persistent queues, and methods for getting stats on them. Starling does some trickery here to avoid races around queues that are replaying journals. I made a few false starts, trying to either duplicate the logic or improve it, before I decided the really clever thing would be to make each queue an actor. Once I made that leap, the code practically wrote itself, and I stopped for a week.

Second chunk

The remainder of Starling is front-end code for handling connections, speaking memcache protocol, and interfacing with QueueCollection. I needed to break away from porting code at this point, because Starling uses a ruby wrapper around EventMachine, a fast asynchronous event I/O library. New incoming connections create a Handler object and send it events when new data arrives. I was sure this was a place to use actors, but I wasn’t sure how much of the I/O code I’d need to write. (I wrote most of ziggurat, Danger’s async I/O library, so I’m not scared of writing an I/O library, but it would be really time-consuming and non-fun.)

Googling for “java EventMachine” turned up links to an apache project called Mina. Aha! This was pretty much exactly what I was looking for. In fact, it closely resembles a ziggurat-derived library my friend Matt is working on, since they both use the idea of pushing protocol encoders into the I/O event engine. Mina not only creates a new “session” object for incoming connections, it can decode the wire protocol inside its own worker threads, and notify your session of I/O events by sending it fully-decoded objects.

This model is so close to how actors work that it just made my mouth water. I just needed some kind of gasket! More googling revealed that one (and only one) person had already done this work, and written a scala wrapper for Mina that turned “I/O events” into actor messages. Well actually, he did it as a patch against Mina instead of a wrapper, for ill-explained reasons. Oh, and the patch fails to compile. Oh, and all the links on his project page are broken. Oh, and also he has no posted email address. FAIL. I made an attempt to reach him through blog comments (ugh) and decided to do this one on my own.

Luckily, once I read enough of the (actually semi-decent) Mina docs, I was able to connect Mina to actors in less than 50 lines of code. I told you: the two models (Mina and actors) really are made for each other. I hope someone ends up writing a working wrapper for it. Or heck, just use mine as a starting point.

The last piece was a memcache protocol handler, written to Mina’s API. I found one from a project called jmemcached, ported the bits I needed into scala, fixed it up to use some of scala’s nicer collections like ArrayBuffer[Byte], and impatiently patched the wrole thing together for some trial runs.

First Results

The first results were pretty discouraging. I wrote a quick test script to open a connection and do 10,000 queue SETs, and on my old Macbook Pro, Starling could do them in under 4 seconds. My scala port could do them in 30. This is roughly a factor of 10 worse – an order of magnitude. Miserable failure. What was I doing wrong?

Scala doesn’t have many (any?) performance benchmarking tools, but java does, and these java tools don’t seem to care if your jar was made by java or scala. I fixed a few small things, but wasn’t making much progress. My inexperience with hprof made me fix the least important things first, but the last two were worth telling about.

Using an actor for each queue was overkill. Reading from or writing to a queue is such a fast operation that the message-passing for “please write to this queue”, “okay done” was deadly. I reluctantly scrapped the actor code there and used simple locks for the queues, and shaved off several seconds. The client connections were already actors, so it was just adding too much overhead to have the queues themselves be separate actors.

The biggest gain, which took me the longest time to find, and made me feel the most foolish, was in the memcache protocol decoder. I thought I was being really clever by sending incoming data into a scala ArrayBuffer[Byte] and doing functional operations on it. Those operations are sometimes slow as molasses! One, “trimStart”, was using up 1 millsecond per queue push, or 10 seconds of total test time all by itself.

Begrudgingly, I dove into my memcache protocol decoder and decided to do things the Mina way instead of trying to be clever. Using Mina’s ByteBuffer class slashed times dramatically, and suddenly my scala code was as fast as Starling (less than 4 seconds). I actually spent a couple of days fretting about the poor performance before this “eureka” moment, and was very relieved to find out that the slowness was entirely due to my own brain damage, and not to anything in scala or actors.

Better Results

“As fast as ruby” may not sound impressive. In some circles, it’s actually an insult. But for this test, I felt good. First, Starling doesn’t do anything intensive in ruby. It uses ruby’s expressiveness and its libraries to make a small server that’s mostly disk I/O and network I/O bound. EventMachine does a bang-on job of network I/O so there’s little room for improvement.

However… My test was effectively written to play to Starling’s strengths. Since Starling runs in a single thread on a single process (STSP), it excels at handling a single client which makes a lot of requests, but waits for a response between each request. The advantage of the JVM appeared when I changed the test to emulate 100 clients doing 100 queries each.

To Starling, this is exactly the same test (10,000 queries, handled sequentially) so it gets approximately the same result (less than four seconds). Mina-plus-actors starts to shine here, though, and finishes in under two seconds. It’s able to juggle queue work with I/O threads. Success!

Conclusion

I’ve heard there’s a big migration of ruby people to scala, and so the first thing I would say to the ruby people is that this is no panacea. It’s not ruby on a JVM; it’s an entirely new langauge, with much stronger java roots than any other language, so familiarity with java is probably more helpful than python or ruby. On the other hand, if ruby whetted your appetite for functional programming, scala has more of that than ruby and python combined, and seems to live up to its promise of exposing the wonders of java’s scalability and rock-solid virtual machine and garbage collector.

The main stumbling blocks I hit were the same ones others have talked about: scarcity of tools, and a lack of pure-scala libraries – both probably due to the language being so new. Coming from java, it felt great to express myself with python conciseness. It was like opening up the space capsule and breathing fresh air again. Coming from ruby, it was reassuring to write for a platform and libraries that are solid and well-tested, not just an “overnight project” like many of the core ruby libraries appear to be.

Oh, and in the end, my Starling port was only a little over 900 lines long. I think if I added proper config file support like Starling has, it would end up being roughly the same number of lines (1000), even though I had to write a few large pieces (like the memcache codec) that aren’t necessary in ruby.

The end result, which I plan to keep playing with and expanding, is here: http://github.com/robey/scarling/tree/master