Ravelry Runs On – 2010

I guess that it’s time for the 3rd annual “Ravelry Runs On” roundup. The last two were in March 2008 and March 2009.

This year, our traffic increased by 50% to 5,000,000 page views and 15 million Rails requests per day. We made very few changes to our architecture in 2009 but we did add a new master database server after our working set of data outgrew our memory and IO capacity.

This summary is more detailed then the last two and I’ve broken it up into rough sections.

Physical Network

We own our own servers and colocate then in a datacenter here in Boston. The datacenter provides us with a cooled half cabinet, redundant power, and a blend of premium (Internap, Savvis) bandwidth. We do the rest.

I use servers from Silicon Mechanics because of the high level of configurability and the price. For databases, I use 1U machines with up to 8 2.5″ SAS disks. For app servers, I use cheaper 1U machines with SATA raid but I think I’ll be moving to half-depth single-drive machines that use SSD disks. (These links are for the new Westmere Xeons so they are a little pricier than what we have now)

Our network is pretty simple. We have two Internet Service Providers because I shopped around for a better deal once we reached the limit of our bundled bandwidth. Nothing fancy when it comes to routing – some hosts use one, some use the other. We use Cisco ASA firewalls and Dell managed gigabit switches. Every server has integrated KVM over IP that is attached to a management network so that I can have console access, power toggle, and other things in the event of an emergency. I only need to go to the datacenter to install new hardware and replace faulty hardware.

We use Amazon S3 to store and serve all of our non-database data. S3 is our biggest expense but without it, we’d have to handle 10 terabytes of redundant storage and 60 additional Mbps of traffic. On the flip side, we host our own servers instead of using Amazon EC2 or another cloud service because we get more performance for less money. We’re just not in the sweet spot – we need more than 1-2 servers but less than “a lot” and we have traffic/growth that isn’t spiky. It might be a different story if we needed to hire a sysadmin but it’s a very small part of my job and I enjoy doing it.

Application

Ravelry is a Rails 2.2.x application running on Ruby 1.8.7.

I went from following Rails Edge, to upgrading to the released versions shortly after they came out, to sticking with 2.2. I never got around to upgrading to 2.3 because it didn’t include any compelling improvements and now Rails 3.0 is looming. I honestly don’t know if I’m going to try to migrate to Rails 3 or if I’ll just move to Rails 2.3 and stay there. At some point you have to stop fighting the framework upgrade battle and as I look at all of the changes and incompatibilities in Rails 3, I think that it might be time. It might depend on Arel – if I can’t plug it in to Rails 2, I may be tempted to upgrade.

It’s a strange time for people writing brand new Rails applications. If I were writing something new, I’d start with Ruby 1.9 and Rails 3 and deal with the downsides of being an early adopter.

I run Phusion’s Ruby Enterprise Edition. REE is a drop-in replacement for regular Ruby (MRI) that is faster and more memory efficient and there is no reason not to use it. REE brought us impressive performance gains and memory use reduction and it it didn’t exist, I’m certain that I would have moved from MRI to JRuby. I’m happy with REE so now I’m waiting and seeing – there is a lot going on in the world of Ruby implementations. Ruby 1.9 looks great from a performance perspective but I don’t know of any large sites that are running it in production with Rails.

Each application server runs Ruby on Rails under Apache and Phusion Passenger. Passenger is stable, zero maintenance, and I don’t have any complaints about it. I prefer nginx to Apache and I’ll probably use it on future app servers – it just wasn’t supported when I first moved to Passenger. For the last year, we’ve run 6 Passengers (fewer servers because of virtualization) that spawn up to 20 application instances each. I allocate roughly 7 GB of memory and 1 x 4-core CPU for each of these Passengers.

Web Server / Load Balancing

We use nginx for our front-end web server and for additional static file servers. Compared to Apache, nginx is simpler to install and upgrade, faster, more memory efficient, and easier to configure. It’s a really nice piece of software.

For application requests, nginx proxies to haproxy and haproxy load balances across all of our application servers. haproxy’s performance is excellent, it’s monitoring tools are good, and it has all of the configurability that we need in a load balancer. Failing instances are removed from the pool and it was easy for me to set up rolling application upgrades/restarts. (This is described more in last year’s post)

To sum up when you hit www.ravelry.com….

  • Your request is handled by nginx, where URL rewrites are performed, etc etc
  • If your request is an application request, it is proxied to haproxy
  • haproxy chooses an Apache/Passenger instance for handle the request using round-robin load balancing and weighting (more weight for more powerful app servers)
  • Apache/Passenger hands to request off to Rails. Passenger’s own queuing should never come in to play because haproxy knows that each application server can only handle X simultaneous requests.

Databases and Search

MySQL / MariaDB

This year, I moved from MySQL 5.0 to MariaDB 5.1.

In my opinion, MariaDB is the best way to run MySQL. Maria is MySQL 5.1 with significant mprovements. The most notable improvements (for us) are the inclusion of the InnoDB plugin and InnoDB performance patches via Percona’s awesome XtraDB work, and more releases / faster bugfixes / more active development. We’ve been running Maria for 4 months peaking at about 2500 queries per second and it has proven to be very stable. It’s hard to quantify the performance improvements because we changed hardware at the same time, but Percona’s patches (benchmarked and documented in detail on MySQL Performance Blog) and InnoDB table compression have been huge for us.

We have one master database and two slaves. One slave is used only for backup and it dumps all of its tables, compresses them, and sends them to Amazon S3 nightly.

The main slave is mostly for the search engine and failover but I do send some queries its way by using DB Charmer. For now, all operations on the slave are explicit and fall back to the master if the slave is lagging (I’ve written a tiny Model.with_active_slave { } extension that accomplishes this) Slave lag is detected by using mk-heartbeat from the excellent Maatkit utility collection.

Caching

I continue to use memcached for caching. I try not to make a mess out of things and I prefer caching when the set and expire logic can be encapsulated inside a single model. I almost never use caching for partial pages.

I also use Tokyo Cabinet and Tokyo Tyrant as an alternative to memcached. Tyrant supports the memcached protocol so it was easy for me to drop in to the application. Tokyo is a persistent, disk-based key/value store that is very very fast – memcached speeds without being in memory. I choose Tokyo over memcached for large objects that seldom expire. The majority of these items are blobs of markdown that have been rendered into HTML – we use markdown heavily on Ravelry and almost every user-editable page that allows for blocks of text supports markdown. These things are prime for caching (slow generation time, seldom expiring) but they fill up memcached quickly – Tokyo was a perfect solution.

Search

I’m still very happy with Sphinx.

It’s stable, it’s very fast, and it’s very easy to index documents that come from a relational database. We’re going to need more faceted search in the future and Sphinx doesn’t do this natively, but I think that I’d prefer to make it work with Sphinx over adding Solr or Lucene into the mix. Sphinx always feels flexible and makes it easy to solve search problems but one area where it has really made my life easier is indexing speed – reindexing huge data sets is extremely fast. Sphinx has native MySQL support, connects to a local MySQL over a domain socket, and blazes along.

Redis

Redis is one of the more exciting things (to me, at least) that has come out of the whole NoSQL movement. We’re currently using it as part of our advertising delivery system but after working with it, I can think of many tasks that it would do better and more simpy than a relational database. In particular, I think that it will be very helpful in dealing with and scaling our “social activity stream” things where each user has a customized view based on their friends and their preferences.

Front End

I’m so glad that Haml and Sass are catching on. People deserve better ways of writing templates and CSS and Haml and SASS deliver. I started doing more with Sass expressions after reading Peepcode’s “about this blog“.

For Javascript, I’m still using Prototype and Scriptaculous for now. I’m waiting to see what happens with FuseJS. I’ve replaced Prototype’s selector engine with the (much faster) NWMatcher so that it performs on par with jQuery. I prefer to write my own JS – the only 3rd party code that I’m really using apart from Prototype/Scripty is Emile for CSS animations. The Javascript module pattern keeps things clean and I have lots “packages” for different Javascript. I don’t use the Rails helpers that stuff Javascript in the page.

I still use asset_packager to bundle, compress, and version the Javascript and CSS into files when the site is deployed.

Other things

  • I’m running Gentoo Linux everywhere. Gentoo thinks that Portage is the best software management tool for Linux and I agree.
  • Most of our machines are virtualized with Xen but I’ve been starting to run unvirtualized machines. Virtualization overhead on network and disk IO is not insignificant, even with the latest and greatest kernel and processors. Neither database is virtualized. I’m finding that LVM is the part that I really love, and I can have that without virtualizing.
  • beanstalkd and plain Ruby daemons for background jobs. (No delayed_job plugin) Resque may be the new hotness but I am totally happy with what I already have.
  • Nagios and Pingdom for monitoring (alerts)
  • Munin and New Relic RPM for monitoring (graphing) Munin is great, New Relic is worth the money.
  • Postfix for mail.

Any questions?

As you can probably tell, I started to run out of steam as I got further down the page. If I glossed over anything that you were interested in, leave a comment. I can follow up with more information in the comments on even in a future post.

Comments (14)

  1. dan wrote:

    Not a knitter but I love these tech posts. I like living the dream vicariously through you. I gotta check out Maria & Munin.

    Tell your users to join spokt, m’kay? ;)

    Wednesday, March 24, 2010 at 10:45 am #
  2. Great write-up. Thanks for sharing!

    Wednesday, March 24, 2010 at 11:10 am #
  3. vismajor wrote:

    Thanks for another great write-up; I find it interesting to see how the site evolves under the hood from year to year.

    Wednesday, March 24, 2010 at 12:31 pm #
  4. Carl Cravens wrote:

    Appreciate the post. I’m both a knitter and a Linux systems admin / developer, and I take a keen interest in community forums. Rav is an amazing piece of software, hitting many of the notes available forum software fails to hit.

    The data-point on Xen is useful… I’ve been considering making a bid to put our Windows terminal servers on bare metal due to the performance issues we’re having.

    Lots of stuff to look at here… I’m an Apache2 adherent out of momentum; it’s what I know. But it’s often overkill for what I want to do with it… I’ll be looking into a lot of these pieces (MariaDB on my WordPress) to see how they fit into my toolbox.

    Thanks again!

    Wednesday, March 24, 2010 at 4:25 pm #
  5. Paulo M wrote:

    very impressed with the ravelry setup – there’s quite a lot going on, specially being a one-man show. have you found the secret formula for 48h days? :)

    there’s quite a few common things with what i use for having postcrossing.com running like gentoo, apache, mysql, amazon s3, redis, … but also some relevant differences – for one, i opted for ec2 instead. very interesting to read your setup and know what keeps ravelry ticking. would love to know how you got mariadb into gentoo – from source?

    keep up the excellent work and best of luck for ravelry!

    Wednesday, March 24, 2010 at 4:36 pm #
  6. Awesome job on the write up. I’m totally agreeing with you that if your writing a web app now, bite the bullet and get on Rails 3.

    Friday, March 26, 2010 at 9:20 pm #
  7. Casey — it’s good to know what the state of the real art is. There are a lot of flash-in-the-pan technologies which all the neophytes flock towards … and then, it turns out to be junk, or the guy who wrote it gets a real job, or whatever.

    Thanks for your posts. They’re worth waiting for :-)

    And by the way, come check out my new gig — I finally found a business that is both good for the planet and good for me.

    Tom

    Saturday, March 27, 2010 at 2:55 pm #
  8. Seth Ladd wrote:

    Wow, great write up. I love reading about how real people run real sites with real traffic. Your description of what it takes to run a production site makes me wonder why anyone would ever NOT choose Heroku, Engine Yard, or Google App Engine. Writing an application is complex enough, let along managing the hosting and deployment!

    Monday, April 12, 2010 at 12:35 am #
  9. Mark wrote:

    I am curious about your decision to use Tokyo as a supplement to memcache. Can you say how collectively large these blobs are that the extra complexity of Tokyo made sense over just adding more memcache?

    We run 64GB of memcache in two pools and I’d willingly double that for the simplicity it brings.

    Thanks!

    Tuesday, April 20, 2010 at 3:13 pm #
  10. Casey wrote:

    (response to Mark)

    Tokyo Tyrant can speak the memcached protocol so I just dropped it in.

    From an application point of view, it’s just a memcached that happens to be persistent (and large – currently storing 130 GB of data).

    Tuesday, April 20, 2010 at 3:29 pm #
  11. Sarah wrote:

    Thanks for everything you do to make Ravelry possible. I am technologically challenged but computer avid, so you and all your helpers are my heroes!

    Saturday, May 15, 2010 at 1:52 pm #
  12. DaveInTexas@Ravelry wrote:

    I thought you rolled out features rather quickly — now I understand! I had wondered if it was a Rails app. Love Ruby and Rails, but love Ravelry even more.

    Thanks for doing what you do! –David

    Sunday, June 6, 2010 at 1:25 am #
  13. Matt Doran wrote:

    Interesting post!

    My wife is a huge Ravelry fan, and I’m a geek curious about how you run your site.

    You mention that you have 10GB of data on S3. What are you storing there? You said non-DB data … but peoples images are on flickr … so I couldn’t imagine what would make up the 10GB.

    Saturday, July 17, 2010 at 5:07 am #
  14. Salman wrote:

    Can you give some details on the web servers. How many instances are you running, and how much RAM have you allocated per instance of phusion?

    Also curious as to how many page views a single web server can push out on a app like ravelry.

    great post!

    Monday, September 20, 2010 at 10:40 am #

Trackbacks/Pingbacks (6)

  1. Quick update: Ravelry runs on -- Code Monkey Island on Wednesday, March 24, 2010 at 9:50 am

    […] An UPDATED 2010 post can be found here: http://codemonkey.ravelry.com/2010/03/24/ravelry-runs-on-2010 […]

  2. […] Ravelry Runs On – 2010 — Code Monkey Island […]

  3. […] Ravelry Runs On – 2010 — Code Monkey Island codemonkey.ravelry.com/2010/03/24/ravelry-runs-on-2010 – view page – cached I guess that it’s time for the 3rd annual “Ravelry Runs On” roundup. The last two were in March 2008 and March 2009. Filter tweets […]

  4. Wilder runs wild | Tampa Bay Buccaneers NFL Announcer on Thursday, March 25, 2010 at 12:09 am

    […] Ravelry Runs On – 2010 — Code Monkey Island […]

  5. uberVU - social comments on Thursday, March 25, 2010 at 4:38 am

    Social comments and analytics for this post…

    This post was mentioned on Twitter by caseyf: I wrote a new “Ravelry Runs On” blog post about under-the-hood stuff. As usual, I ran out of steam at the end http://bit.ly/aYZxvr

  6. […] Ravelry Runs On – 2010 — Code Monkey Island […]