I guess that it’s time for the 3rd annual “Ravelry Runs On” roundup.
The last two were in March 2008 and March 2009.
This year, our traffic increased by 50% to 5,000,000 page views and 15 million Rails requests per day.
We made very few changes to our architecture in 2009 but we did add a new master database server after
our working set of data outgrew our memory and IO capacity.
This summary is more detailed then the last two and I’ve broken it up into rough sections.
Physical Network

We own our own servers and colocate then in a datacenter here in Boston. The datacenter
provides us with a cooled half cabinet, redundant power, and a blend of premium (Internap, Savvis) bandwidth. We do the rest.
I use servers from Silicon Mechanics because of the high level of configurability and the price.
For databases, I use 1U machines with up to 8 2.5″ SAS disks. For app servers, I use cheaper 1U machines
with SATA raid but I think I’ll be moving to half-depth single-drive machines that use SSD disks. (These links
are for the new Westmere Xeons so they are a little pricier than what we have now)
Our network is pretty simple. We have two Internet Service Providers because I shopped around for a better deal once we reached the limit of our bundled bandwidth. Nothing fancy when it comes to routing – some hosts use one, some use the other. We use Cisco ASA firewalls and Dell managed gigabit switches. Every server has integrated KVM over IP that is attached to a management
network so that I can have console access, power toggle, and other things in the event of an emergency. I only need to go to the datacenter to install new hardware and replace
faulty hardware.
We use Amazon S3 to store and serve all of our non-database data. S3 is our biggest expense but without it, we’d have to handle 10 terabytes of redundant storage
and 60 additional Mbps of traffic. On the flip side, we host our own servers instead of using Amazon EC2 or another cloud service because we get more performance for less money. We’re just not in the sweet spot – we need more than 1-2 servers but less than “a lot” and we have traffic/growth that isn’t spiky. It might be a different story if we needed to hire a sysadmin but it’s a very small part of my job and I enjoy doing it.
Application

Ravelry is a Rails 2.2.x application running on Ruby 1.8.7.
I went from following Rails Edge, to upgrading to the released versions shortly after they came out, to sticking with 2.2. I never got around to upgrading to 2.3 because it didn’t include any compelling improvements and now Rails 3.0 is looming. I honestly don’t know if I’m going to try to migrate to Rails 3 or if I’ll just move to Rails 2.3 and stay there. At some point you have to stop fighting the framework upgrade battle and as I look at all of the changes and incompatibilities in Rails 3, I think that it might be time. It might depend on Arel – if I can’t plug it in to Rails 2, I may be tempted to upgrade.
It’s a strange time for people writing brand new Rails applications. If I were writing something new, I’d start with Ruby 1.9 and Rails 3 and deal with the downsides of being an early adopter.
I run Phusion’s Ruby Enterprise Edition. REE is a drop-in replacement for regular Ruby (MRI) that is faster and more memory efficient and there is no reason not to use it. REE brought us impressive performance gains and memory use reduction and it it didn’t exist, I’m certain that I would have moved from MRI to JRuby. I’m happy with REE so now I’m waiting and seeing – there is a lot going on in the world of Ruby implementations. Ruby 1.9 looks great from a performance perspective but I don’t know of any large sites that are running it in production with Rails.
Each application server runs Ruby on Rails under Apache and Phusion Passenger. Passenger is stable, zero maintenance, and I don’t have any complaints about it. I prefer nginx to Apache and I’ll probably use it on future app servers – it just wasn’t supported when I first moved to Passenger. For the last year, we’ve run 6 Passengers (fewer servers because of virtualization) that spawn up to 20 application instances each. I allocate roughly 7 GB of memory and 1 x 4-core CPU for each of these Passengers.
Web Server / Load Balancing

We use nginx for our front-end web server and for additional static file servers. Compared to Apache, nginx is simpler to install and upgrade, faster, more memory efficient, and easier to configure. It’s a really nice piece of software.
For application requests, nginx proxies to haproxy and haproxy load balances across all of our application servers. haproxy’s performance is excellent, it’s monitoring tools are good, and it has all of the configurability that we need in a load balancer. Failing instances are removed from the pool and it was easy for me to set up rolling application upgrades/restarts. (This is described more in last year’s post)
To sum up when you hit www.ravelry.com….
- Your request is handled by nginx, where URL rewrites are performed, etc etc
- If your request is an application request, it is proxied to haproxy
- haproxy chooses an Apache/Passenger instance for handle the request using round-robin load balancing and weighting (more weight for more powerful app servers)
- Apache/Passenger hands to request off to Rails. Passenger’s own queuing should never come in to play because haproxy knows that each application server can only handle X simultaneous requests.
Databases and Search

MySQL / MariaDB
This year, I moved from MySQL 5.0 to MariaDB 5.1.
In my opinion, MariaDB is the best way to run MySQL. Maria is MySQL 5.1 with significant mprovements. The most notable improvements (for us)
are the inclusion of the InnoDB plugin and InnoDB performance patches via Percona’s awesome XtraDB work, and more releases / faster bugfixes / more active development.
We’ve been running Maria for 4 months peaking at about 2500 queries per second and it has proven to be very stable. It’s hard to quantify the performance improvements because we changed hardware at the same time, but Percona’s patches (benchmarked and documented in detail on MySQL Performance Blog) and InnoDB table compression have been huge for us.
We have one master database and two slaves. One slave is used only for backup and it dumps all of its tables, compresses them, and sends them to Amazon S3 nightly.
The main slave is mostly for the search engine and failover but I do send some queries its way by using DB Charmer. For now, all operations on the slave are explicit and fall back to the master if the slave is lagging (I’ve written a tiny Model.with_active_slave { } extension that accomplishes this) Slave lag is detected by using mk-heartbeat from the excellent Maatkit utility collection.
Caching
I continue to use memcached for caching. I try not to make a mess out of things and I prefer caching when the set and expire logic can be encapsulated inside a single model. I almost never use caching for partial pages.
I also use Tokyo Cabinet and Tokyo Tyrant as an alternative to memcached. Tyrant supports the memcached protocol so it was easy for me to drop in to the application. Tokyo is a persistent, disk-based key/value store that is very very fast – memcached speeds without being in memory. I choose Tokyo over memcached for large objects that seldom expire. The majority of these items are blobs of markdown that have been rendered into HTML – we use markdown heavily on Ravelry and almost every user-editable page that allows for blocks of text supports markdown. These things are prime for caching (slow generation time, seldom expiring) but they fill up memcached quickly – Tokyo was a perfect solution.
Search
I’m still very happy with Sphinx.
It’s stable, it’s very fast, and it’s very easy to index documents that come from a relational database. We’re going to need more faceted search in the future and Sphinx doesn’t do this natively, but I think that I’d prefer to make it work with Sphinx over adding Solr or Lucene into the mix. Sphinx always feels flexible and makes it easy to solve search problems but one area where it has really made my life easier is indexing speed – reindexing huge data sets is extremely fast. Sphinx has native MySQL support, connects to a local MySQL over a domain socket, and blazes along.
Redis
Redis is one of the more exciting things (to me, at least) that has come out of the whole NoSQL movement. We’re currently using it as part of our advertising delivery system but after working with it, I can think of many tasks that it would do better and more simpy than a relational database. In particular, I think that it will be very helpful in dealing with and scaling our “social activity stream” things where each user has a customized view based on their friends and their preferences.
Front End

I’m so glad that Haml and Sass are catching on. People deserve better ways of writing templates and CSS and Haml and SASS deliver. I started doing more with Sass expressions after reading Peepcode’s “about this blog“.
For Javascript, I’m still using Prototype and Scriptaculous for now. I’m waiting to see what happens with FuseJS. I’ve replaced Prototype’s selector engine with the (much faster) NWMatcher so that it performs on par with jQuery. I prefer to write my own JS – the only 3rd party code that I’m really using apart from Prototype/Scripty is Emile for CSS animations. The Javascript module pattern keeps things clean and I have lots “packages” for different Javascript. I don’t use the Rails helpers that stuff Javascript in the page.
I still use asset_packager to bundle, compress, and version the Javascript and CSS into files when the site is deployed.
Other things
- I’m running Gentoo Linux everywhere. Gentoo thinks that Portage is the best software management tool for Linux and I agree.
- Most of our machines are virtualized with Xen but I’ve been starting to run unvirtualized machines. Virtualization overhead on network and disk IO is not insignificant, even with the latest and greatest kernel and processors. Neither database is virtualized. I’m finding that LVM is the part that I really love, and I can have that without virtualizing.
- beanstalkd and plain Ruby daemons for background jobs. (No delayed_job plugin) Resque may be the new hotness but I am totally happy with what I already have.
- Nagios and Pingdom for monitoring (alerts)
- Munin and New Relic RPM for monitoring (graphing) Munin is great, New Relic is worth the money.
- Postfix for mail.
Any questions?
As you can probably tell, I started to run out of steam as I got further down the page. If I glossed over anything that you were interested in, leave a comment. I can follow up with more information in the comments on even in a future post.