Quick things

  • In January, I wrote a post on the main blog called “How Ravelry Makes Money“. If you didn’t catch it, you might be interested in it.
  • The Ravelry API has progressed a lot and I’m pretty happy with how it’s going and how it’s architected. When it comes to serializing objects to JSON, I stay inside the model and use something similar to serialize_with_options. An alternative, view-based approach is DHH’s jbuilder. You can find the Ravelry API developer’s group at http://ravelry.com/groups/ravelry-api.
  • I’ve been working on a mobile optimized site (iPhone, Android, and other Webkit-based devices) that I built with Sencha Touch. I’m really loving Sencha Touch and I’m glad that I went with it. I only wish that rendering and animation was a little faster in the browser and that the iPhone in particular had more Javascript-based access to features such as the photo library and camera. I hope that we’ll get there someday.

Saving money by putting a web cache in front of Amazon S3

We use Amazon S3 to store all of our files (images, PDFs, etc) We do this because it is cost effective – building out our own redundant, distributed storage system would cost roughly the same as 1 year of S3 storage service in equipment purchases alone.

Paying Amazon to serve those files to end users? Not so cost effective.

For the last 5 months, we’ve been serving up files from a caching proxy server that only hits Amazon when necessary.

To illustrate the savings: in the last 30 days we’ve served up 2.7 billion requests for files that are stored on S3 and we’ve transferred a total of 24 terabytes of data.

Recurring monthly cost to pay Amazon to do this

  • GET requests: $270
  • Data transfer: $3100

Total: $3370

Recurring cost to handle it ourselves:

Note that this bandwidth is from a single ISP – we’re not paying for any redundancy and that’s okay because we can always temporarily serve from S3 if there is a problem.

  • Bandwidth (120 Mbps): $1200
  • Cache misses (S3 costs): roughly $400

Total: $1600

One-time costs to handle it ourselves:

  • Old 1U server with 8 GB RAM: $0 but let’s say $1000
  • 2 x SSD drives: $600
  • 1U SuperMicro machine acting as pfSense gigabit router: $1000
  • GigE fiber installation fees: $600
  • Fiber to Cat 5e converter: $200

..so after 2 months we saved enough to cover the up front costs.

SSDs and pfSense helped us keep our up front costs down. SSDs are perfect for this sort of workload and super fast drives made it possible to to repurpose a single crusty old server for this task. pfSense is fantastic open source firewall software that enabled us to build a gigabit firewall/router for $1000 instead of getting totally ripped off by Cisco or someone else on gigabit speed hardware.

There are few disadvantages to this – if anything ever goes wrong with our hardware or connection, we just failover to S3 and stop saving money. Our app currently requires a restart to switch between Amazon and our own hosting (or some other CDN) but it’s a fairly fast and seamless process.

We use nginx to do the proxy caching. It’s pretty much zero-maintenance and the configuration was really simple and straightforward. How do you actually configure it? I posted an answer on this Stack Overflow question: How to set up Nginx as a caching reverse proxy?

So you want to run MySQL on SSDs?

Here’s why I do: it’s time for me to build a new master database server. Our current main slave is too underpowered to be handle our entire load in an emergency, which means that our failover situation isn’t that great. I’ll replace the master with something new and shiny, make some performance improvements while I’m at it, and the old master will work just fine in an emergency.

For IO intensive servers, I conserve space and electricity by using 1U machines with 6 or 8 2.5″ drives.

I’d normally buy 8 Seagate Savvio 15K SAS drives and set them up as a RAID 10 array. This would run me about $1850.

We’re pretty frugal when it comes to our technology budget and I can’t really stomach spending that kind of money to effectively get 550 GB of redundant, fast magnetic disk storage. SATA MLC SSDs that blow traditional drives out of the water are currently under $2 / GB.

Disclaimer

This is a collection of information that I’ve used to inform my decisions. I don’t know what I’m doing, so I don’t want you to take my word for it (seriously) – I’m just hoping that this collection of links will be useful to some people.

Also, this plan might make no sense to you depending on your situation. We buy 1 or 2 servers a year and saving a thousand dollars is a big deal to us.

One more thing: Today is May 9th, 2011. The SSD universe is expanding quickly and this post will likely be obsolete in a matter of months.

Should you buy RAM instead?

Yes. Increasing the size of your InnoDB buffer pool is the best way to speed up MySQL. If you can add more RAM, do it.

It costs $1000 to buy 48 GB of RAM. If your working set (your hot data) can fit into RAM, you probably don’t need to bother with SSDs at all.

Which SSD to choose?

The Intel 320 Series.

It’s an MLC based, Serial ATA SSD in the 2.5″ form factor.

Intel 320

Why?

  • Same price as magnetic disks: the 300 GB version is $540. This is the same price (!!) as the soon to be available Savvio 15K.3 300 GB SAS hard drive.
  • Same level of relability as magnetic disks: This drive is more reliable than the X25-M which I have had great success with. Check out this marketing slide with failure rates from over 1 million deployed X25-M units.
  • This device includes power-loss data protection. Many SSDs can lose the data that is in the process of being written in the event of a power loss. This is very bad. (Intel PDF link)
  • This is the 3rd generation of a proven piece of hardware. I feel very comfortable choosing this Intel device and feeling comfortable is good.
  • Intel has published spec information for server applications that addresses my write endurance concerns. The biggest problem with using MLC based SSDs is that the number of writes that they can handle over their lifetime is drastically smaller than what an SLC can do. This specification PDF gives you some information as well as documentation on the SMART attributes that you can use to predict the life span of the drive given your load.

You might say…

Q: MLCs are bad, shouldn’t you be using an SLC drive like the Intel X25-E.
A. The X25-E is $10 a gigabyte. Even if I weren’t trying to save money, I can’t see spending that much on a 3 Gbps Serial ATA drive. This MLC would be a bad choice for me (vs SLC) if it wouldn’t be able to meet my write needs… but it will.

Check out the PDF mentioned in the last item above. As an aside, Intel’s strategy with this drive is a little strange – they clearly intend for it to be used in server environments but it doesn’t appear to be marketed that way.

Q. Why not choose a PCI solution like Fusion-io ioDrive or Virident TachIOn?
A. The Virident TachIOn looks like a fantastic piece of hardware – check out this benchmark/post on MySQL Performance Blog.

Unfortunately, the 400 GB TachIOn is over $13,000 and I’d probably need the 600 GB model. These are for people who have a serious need for hundreds of gigabytes of persistent flash storage. If this were my only good option, I’d stick with 15K SAS disks.

Q: Why Intel? There are much faster SSDs on the market. The 320 isn’t even a 6 Gbps drive
A: I’m not convinced that it matters much for currently available versions and forks of MySQL, but even if it did – these other vendors can’t really touch Intel’s reliability. I researched the market before this Intel drive was released (> 5 weeks ago) and I had tentatively decided that I wouldn’t be able to use SSDs at all.

Your RAID controller might matter

Check out this slide from Yoshinori Matsunobu’s (highly recommended!) “Linux and H/W Optimizations for MySQL

SSD raid

Scary, huh? I tend to use LSI’s low profile battery-backed MegaRAID controllers. I did a little looking to make sure my usual controller of choice (LSI 9261) would still be a good choice and I found something interesting. For about $100, I can get a software upgrade called “FastPath” that improves performance when RAIDing SSDs.

Fastpath

Neat. I haven’t really heard about this product much and it could be snake oil for all I know ;) but I figure that it’s worth a try.

LSI has a fancy new controller that claims it doubles the IOPS of the 9260s when used with SSDs. It’s about $700 so by the time you add battery backup and the FastPath software, you’ve paid $1000 for a RAID controller. Too rich for my blood, I’ll stick with the 9261 for now.

The Hardware Configuration

Here’s what I’ve purchased:

  • Supermicro 1016T – my favorite server building block. 1U, eight 2.5″ drives, Xeon 5600, 12 DIMM sockets, redundant power, built in management
  • 6 x Intel 320 SSDs configured in RAID 10. SSDs aren’t mechanical, but they can still fail.
  • 2 x Seagate Savvio 15K SAS drives Yup! Trusty Savvio SAS drives – these guys are going to handle the bulk of the sequential writes (doublewrite buffer, transaction logs) since that’s what they do best.
  • The LSI 9261 with FastPath software upgrade

Linux Tuning

I’ll keep this short:

  • Use the latest kernel
  • Do not use the default IO scheduler. CFQ is a little slower with MySQL on regular disks as it is! Change the scheduler to deadline or noop.
  • Use the best filesystem that you can (XFS or EXT4) Someday we’ll have ZFS or SSD optimized filesystems on Linux, but we don’t have either today.
  • You probably want to disable your filesystem’s access time and write barrier /write through options (noatime, nobarrier or barrier=0 in fstab)

MySQL Tuning

If you are going to run MySQL on SSD drives you MUST use Percona’s XtraDB

Really. Stock MySQL’s performance is not even close to what Percona Server can do with SSDs. You can get Percona’s improvements in two ways: in Percona Server or MariaDB.

I usually recommend MariaDB but they do not yet have a release that builds on MySQL 5.5. I may use Percona Server myself.

Put your logs and your doublewrite buffer on traditional hard disks

These files are sequentially (and heavily) written – they are not a good fit for SSD and they are a good fit for traditional disks. The doublewrite buffer location is configurable in Percona Server. The setting is: innodb_doublewrite_file

Here’s a longer explanation with benchmarks: http://yoshinorimatsunobu.blogspot.com/2009/05/tables-on-ssd-redobinlogsystem.html

…and here is the TL;DR

yoshi

Tune InnoDB internals

Set innodb_adaptive_checkpoint = keep_average
Set innodb_flush_neighbor_pages = 0

See http://www.percona.com/docs/wiki/percona-server:features:innodb_io_51

You’ll probably want to tune innodb_io_capacity as well: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_io_capacity

Um… Yeah, I’m just going to buy more RAM instead

Good choice ;)

I’ll leave you with recommendation: consider trying out SSDs on for slave databases. They are especially awesome for backup slaves because any old box and a single SSD may be fast enough to replicate your MySQL database. It’s really nice to have a slave that is dedicated to only backups and with SSDs, doing this is so cheap that there are few reasons not to.





Quick tips for improving email deliverability

If you run a site that needs to deliver email to users it can be surprisingly hard to make sure that your mail makes it to people’s inboxes.

We don’t ever send useless email to users but we still manage to deliver about 500,000 emails each month – account confirmations, CCs of in-app messages (a feature that is off by default), purchase receipts, replies from customer service… There are hosted services like SendGrid that will take care of your email for you but we certainly can’t spend $400 a month on email delivery.


Here are some things that will help you build up a good reputation and lose fewer messages to spam filters:

  • First, the obvious one: Don’t send crappy unwanted mail to users.
  • Make sure that your mail server has a dedicated, permanent IP that is not shared with any other senders
  • Configure DNS correctly with a hostname and PTR for your mail server’s IP
  • Set up SPF records in DNS. Here is a wizard that makes it easy: SPF Setup Wizard.
  • Implement DomainKeys Identified Mail. Here is an tutorial that will walk you through setting this up by pairing DKIMProxy with Postfix: Setting Up DKIM and DomainKeys using DKIMProxy with Postfix in Ubuntu Hardy. If you aren’t using Postfix, I certainly hope that you aren’t using Sendmail. Unless you are a Sendmail expert, it’s a misconfiguration waiting to happen.
  • Apply for AOL’s whitelist and Yahoo’s whitelist.
  • You might also want to sign up for feedback loops for one or some large mail providers (Comcast’s is here) This way, you’ll be made aware of messages that are classified as spam.

Hopefully the above suggestions will help you make sure that fewer emails are canned as spam. You might also be interested in SenderScore’s reputation tool. You can use it to get a reputation score for a particular mail server.

Here’s ours – I have no idea what factors went into this or if it really means much but the SenderScore is a 1-100 scale, so YAY.

Ravelry in Bullet Points

Inspired by Stack Exchange’s Architecture in Bullet Points. I have to admit that I was happy to see that we stack up pretty well, being a Ruby on Rails site. There’s no denying that Ruby is slower than C#. …and we could talk about Microsoft SQL Server vs MySQL as well.

It’s a little bit of an apples to oranges comparison. We’re a members only site that is mostly not indexed in Google and we’ve got a low unique visitor count and a high average time on the site. Also, we’re very, very budget conscious.

Traffic

  • 173 Million Page Views per Month (not including guests + sign in page views)
  • 1600 350 Rails requests per second (at peak times, requests for static files not included) whoa huge error there, 1600 reqs/sec just hitting Rails would be impressive. There were some static files included in that.
  • ? DNS requests per second. My TTLs are set wicked high for most hostnames anyway.
  • 180 Megabits per second at peak times (including static files)

Data Centers

  • 1/2 rack with Hosted Solutions (Windstream), 3 internet connections
  • 1U with RippleWeb for backup
  • Amazon S3 for backup

Production Servers

All are SuperMicro boxes running Gentoo Linux. Most are dual CPU Xeons and they range from 4 years old to new. SSDs have helped me make the oldest servers new again.

  • 4 application servers (Apache / Passenger / Ruby on Rails)
  • 3 database servers (MariaDB, 1 is a low-powered slave for backups)
  • 1 front end web server (nginx)
  • 1 misc utility server / front end web server
  • 1 caching proxy server for static files (nginx)
  • 3 router / firewalls (2 x Cisco, 1 x pfSense)

The most powerful server is the newest database server – a Dual Xeon 5530 with 8 x SAS drives and 48 GB of RAM (all in 1U!). Obviously, we can’t scale up forever but our database + indices is about 220 GB and there are some easy opportunities for splitting big tables out into other servers.

Software and Technologies Used

I usually do a yearly “Ravelry Runs On” post but here are the bullets.

  • Ruby on Rails
  • MariaDB (a better MySQL)
  • Phusion Passenger and REE
  • Gentoo Linux
  • nginx
  • haproxy for load balancing
  • Sphinx for search
  • memcached for caching
  • Redis where it helps
  • beanstalkd for a work queue
  • also using Java for a couple number crunching background jobs
  • Nagios for monitoring
  • MRTG and Munin for graphing
  • Git for source control

Developers and System Administrators

Here’s the part where I sound like an ass. It’s just me. I know that it will be healthier for the site (and for me, too!) to have a copilot. Maybe that’ll be in the cards for 2011.

Ravelry API Early Access

Update December 2010: if you are a developer who is interested in working with the Ravelry API, please see http://www.ravelry.com/groups/ravelry-api


We’ve been getting a lot interest in the Ravelry API lately.

We don’t have a release date for the API and we don’t know exactly what calls will be in the first release, but we are able to support a small number of early access / beta users while we work toward that release. We prefer to work with developers who have an immediate need – those that have a working application that is just missing Ravelry functionality and those who want to integrate Ravelry into an existing app.

The API is JSON over HTTP and should be easy to work with in every language. You will need access to an encryption library that supports AES encryption (such as openssl), which shouldn’t be a problem. If you are interested, please email api@ravelry.com with answers to the following questions.

  1. What platform does your application run on?
  2. What is the nature of your application? (Provide a link to an existing app if you can)
  3. What API calls / Ravelry features will you need? Be as detailed as you can about what your app will need to do.
  4. (optional) What non-Ravelry data are you hoping to associate with Ravelry data? (if any, for example – you may want to attach row counters to projects)

The more details the better :) We don’t be able to take a large number of people because I expect that there will be a lot of working together as we add missing features and work through questions or problems.

I expect to contact some of you in early October. I still have to put together sample code for authenticating/request signing in Java and Objective C (we will be offering OAuth2 but for now there is a similar but custom scheme) and the Terms of Use aren’t ready yet.

Faceted search

Have I mentioned how much I love the Sphinx search engine?

I think that I have.

I want to talk about how I used Sphinx to create a new faceted search for Ravelry but I don’t really know where to begin. I’m was happy while working on it and I’m happy with the result. Sphinx itself performs amazingly and that amount of code needed to make it all work is pretty small and very sensible. It’s really surprising how many ways you can take advantage of what it has to offer – someone should really write a “Sphinx Recipes” book.

I’m pretty sure that I spent more time on Javascript micro-optimization than I did on Sphinx performance optimization… but that’s a topic for another day. (Like.. the facet counts that you see are inside of transparent, disabled form fields because updating their values was waaaay faster in IE)

For now, here is a video. It was made to introduce the features to Ravelers so it doesn’t point out any of the technical things that I’d like to show. It’s just a little peek that gives you some idea of what can be done with Sphinx beyond plain old full text search. Sphinx is handling the full text search, the filtering by facet, the computing of counts in each facet, and the selection of results. Pretty much everything other than pulling the data to display the photos and names. All of the work is done in a single call to Sphinx – one query “batch”.

Ravelry Search Help from Ravelry on Vimeo.

One more neat thing: Here is one of the facets in the people search. Sphinx knows how to calculate geodistance so given a search, it can create a facet by grouping results by distance from some point…and it does it fast.

These Xeon 5600s are fast

I purchased a new single cpu Xeon X5670 (6 core!) application server and diverted a bunch of our load to it just to see how it performs.

The new machine is currently handling 50% of our load without breaking a sweat, and our average response time has improved by 20 milliseconds. This traffic was previously handled by 2 x Xeon 5300 machines (8 cores each) that are over 3 years old. I don’t have enough memory in the new box to figure out how much it can handle before performance starts to degrade – I’ve only got 6 GB in it at the moment due to a build mistake.

Two of these $3000 half-depth single CPU servers could easily run the entire Rails part of the site (Three for redundancy and breathing room)

Here are the specs. It’s an application server so the only disk is a smallish SSD:

  • 1 x Xeon X5670 Six-Core 2.93GHz, 12MB Cache, 6.4GT/s QPI
  • 12GB RAM (3 x 4GB) @ 1333MHz Max
  • 80 GB Intel X25-M SSD
  • Power consumption: 1.0 amps (208V)

Normally I load up with as much memory as is reasonable, but this CPU offers the most performance with 3 DIMMs per CPU. At least I’ve got plenty of memory to spare now that those older servers won’t be running Rails anymore.

Ravelry Runs On – 2010

I guess that it’s time for the 3rd annual “Ravelry Runs On” roundup. The last two were in March 2008 and March 2009.

This year, our traffic increased by 50% to 5,000,000 page views and 15 million Rails requests per day. We made very few changes to our architecture in 2009 but we did add a new master database server after our working set of data outgrew our memory and IO capacity.

This summary is more detailed then the last two and I’ve broken it up into rough sections.

Physical Network

We own our own servers and colocate then in a datacenter here in Boston. The datacenter provides us with a cooled half cabinet, redundant power, and a blend of premium (Internap, Savvis) bandwidth. We do the rest.

I use servers from Silicon Mechanics because of the high level of configurability and the price. For databases, I use 1U machines with up to 8 2.5″ SAS disks. For app servers, I use cheaper 1U machines with SATA raid but I think I’ll be moving to half-depth single-drive machines that use SSD disks. (These links are for the new Westmere Xeons so they are a little pricier than what we have now)

Our network is pretty simple. We have two Internet Service Providers because I shopped around for a better deal once we reached the limit of our bundled bandwidth. Nothing fancy when it comes to routing – some hosts use one, some use the other. We use Cisco ASA firewalls and Dell managed gigabit switches. Every server has integrated KVM over IP that is attached to a management network so that I can have console access, power toggle, and other things in the event of an emergency. I only need to go to the datacenter to install new hardware and replace faulty hardware.

We use Amazon S3 to store and serve all of our non-database data. S3 is our biggest expense but without it, we’d have to handle 10 terabytes of redundant storage and 60 additional Mbps of traffic. On the flip side, we host our own servers instead of using Amazon EC2 or another cloud service because we get more performance for less money. We’re just not in the sweet spot – we need more than 1-2 servers but less than “a lot” and we have traffic/growth that isn’t spiky. It might be a different story if we needed to hire a sysadmin but it’s a very small part of my job and I enjoy doing it.

Application

Ravelry is a Rails 2.2.x application running on Ruby 1.8.7.

I went from following Rails Edge, to upgrading to the released versions shortly after they came out, to sticking with 2.2. I never got around to upgrading to 2.3 because it didn’t include any compelling improvements and now Rails 3.0 is looming. I honestly don’t know if I’m going to try to migrate to Rails 3 or if I’ll just move to Rails 2.3 and stay there. At some point you have to stop fighting the framework upgrade battle and as I look at all of the changes and incompatibilities in Rails 3, I think that it might be time. It might depend on Arel – if I can’t plug it in to Rails 2, I may be tempted to upgrade.

It’s a strange time for people writing brand new Rails applications. If I were writing something new, I’d start with Ruby 1.9 and Rails 3 and deal with the downsides of being an early adopter.

I run Phusion’s Ruby Enterprise Edition. REE is a drop-in replacement for regular Ruby (MRI) that is faster and more memory efficient and there is no reason not to use it. REE brought us impressive performance gains and memory use reduction and it it didn’t exist, I’m certain that I would have moved from MRI to JRuby. I’m happy with REE so now I’m waiting and seeing – there is a lot going on in the world of Ruby implementations. Ruby 1.9 looks great from a performance perspective but I don’t know of any large sites that are running it in production with Rails.

Each application server runs Ruby on Rails under Apache and Phusion Passenger. Passenger is stable, zero maintenance, and I don’t have any complaints about it. I prefer nginx to Apache and I’ll probably use it on future app servers – it just wasn’t supported when I first moved to Passenger. For the last year, we’ve run 6 Passengers (fewer servers because of virtualization) that spawn up to 20 application instances each. I allocate roughly 7 GB of memory and 1 x 4-core CPU for each of these Passengers.

Web Server / Load Balancing

We use nginx for our front-end web server and for additional static file servers. Compared to Apache, nginx is simpler to install and upgrade, faster, more memory efficient, and easier to configure. It’s a really nice piece of software.

For application requests, nginx proxies to haproxy and haproxy load balances across all of our application servers. haproxy’s performance is excellent, it’s monitoring tools are good, and it has all of the configurability that we need in a load balancer. Failing instances are removed from the pool and it was easy for me to set up rolling application upgrades/restarts. (This is described more in last year’s post)

To sum up when you hit www.ravelry.com….

  • Your request is handled by nginx, where URL rewrites are performed, etc etc
  • If your request is an application request, it is proxied to haproxy
  • haproxy chooses an Apache/Passenger instance for handle the request using round-robin load balancing and weighting (more weight for more powerful app servers)
  • Apache/Passenger hands to request off to Rails. Passenger’s own queuing should never come in to play because haproxy knows that each application server can only handle X simultaneous requests.

Databases and Search

MySQL / MariaDB

This year, I moved from MySQL 5.0 to MariaDB 5.1.

In my opinion, MariaDB is the best way to run MySQL. Maria is MySQL 5.1 with significant mprovements. The most notable improvements (for us) are the inclusion of the InnoDB plugin and InnoDB performance patches via Percona’s awesome XtraDB work, and more releases / faster bugfixes / more active development. We’ve been running Maria for 4 months peaking at about 2500 queries per second and it has proven to be very stable. It’s hard to quantify the performance improvements because we changed hardware at the same time, but Percona’s patches (benchmarked and documented in detail on MySQL Performance Blog) and InnoDB table compression have been huge for us.

We have one master database and two slaves. One slave is used only for backup and it dumps all of its tables, compresses them, and sends them to Amazon S3 nightly.

The main slave is mostly for the search engine and failover but I do send some queries its way by using DB Charmer. For now, all operations on the slave are explicit and fall back to the master if the slave is lagging (I’ve written a tiny Model.with_active_slave { } extension that accomplishes this) Slave lag is detected by using mk-heartbeat from the excellent Maatkit utility collection.

Caching

I continue to use memcached for caching. I try not to make a mess out of things and I prefer caching when the set and expire logic can be encapsulated inside a single model. I almost never use caching for partial pages.

I also use Tokyo Cabinet and Tokyo Tyrant as an alternative to memcached. Tyrant supports the memcached protocol so it was easy for me to drop in to the application. Tokyo is a persistent, disk-based key/value store that is very very fast – memcached speeds without being in memory. I choose Tokyo over memcached for large objects that seldom expire. The majority of these items are blobs of markdown that have been rendered into HTML – we use markdown heavily on Ravelry and almost every user-editable page that allows for blocks of text supports markdown. These things are prime for caching (slow generation time, seldom expiring) but they fill up memcached quickly – Tokyo was a perfect solution.

Search

I’m still very happy with Sphinx.

It’s stable, it’s very fast, and it’s very easy to index documents that come from a relational database. We’re going to need more faceted search in the future and Sphinx doesn’t do this natively, but I think that I’d prefer to make it work with Sphinx over adding Solr or Lucene into the mix. Sphinx always feels flexible and makes it easy to solve search problems but one area where it has really made my life easier is indexing speed – reindexing huge data sets is extremely fast. Sphinx has native MySQL support, connects to a local MySQL over a domain socket, and blazes along.

Redis

Redis is one of the more exciting things (to me, at least) that has come out of the whole NoSQL movement. We’re currently using it as part of our advertising delivery system but after working with it, I can think of many tasks that it would do better and more simpy than a relational database. In particular, I think that it will be very helpful in dealing with and scaling our “social activity stream” things where each user has a customized view based on their friends and their preferences.

Front End

I’m so glad that Haml and Sass are catching on. People deserve better ways of writing templates and CSS and Haml and SASS deliver. I started doing more with Sass expressions after reading Peepcode’s “about this blog“.

For Javascript, I’m still using Prototype and Scriptaculous for now. I’m waiting to see what happens with FuseJS. I’ve replaced Prototype’s selector engine with the (much faster) NWMatcher so that it performs on par with jQuery. I prefer to write my own JS – the only 3rd party code that I’m really using apart from Prototype/Scripty is Emile for CSS animations. The Javascript module pattern keeps things clean and I have lots “packages” for different Javascript. I don’t use the Rails helpers that stuff Javascript in the page.

I still use asset_packager to bundle, compress, and version the Javascript and CSS into files when the site is deployed.

Other things

  • I’m running Gentoo Linux everywhere. Gentoo thinks that Portage is the best software management tool for Linux and I agree.
  • Most of our machines are virtualized with Xen but I’ve been starting to run unvirtualized machines. Virtualization overhead on network and disk IO is not insignificant, even with the latest and greatest kernel and processors. Neither database is virtualized. I’m finding that LVM is the part that I really love, and I can have that without virtualizing.
  • beanstalkd and plain Ruby daemons for background jobs. (No delayed_job plugin) Resque may be the new hotness but I am totally happy with what I already have.
  • Nagios and Pingdom for monitoring (alerts)
  • Munin and New Relic RPM for monitoring (graphing) Munin is great, New Relic is worth the money.
  • Postfix for mail.

Any questions?

As you can probably tell, I started to run out of steam as I got further down the page. If I glossed over anything that you were interested in, leave a comment. I can follow up with more information in the comments on even in a future post.

Ruby GC Tuning – try some

I upgraded my versions of Phusion Passenger and Ruby Enterprise Edition last night.

This morning, I noticed that Nagios was alerting me that the load average on one of my machines was in the warning range.

Turns out that on one of my virtual machines, I forgot to configure Apache to use my ruby-wrapper script that sets environment variables for garbage collector tuning. Current versions of Ruby Enterprise Edition include the patches that support these tuning options. I knew that the tuning made a big difference but wow – check this out.

These are identical virtual machines on the same physical box that are receiving the same amount of traffic via round-robin load balancing. You can clearly see the ledge on Wednesday night when I performed the upgrade.


I haven’t experimented with the settings that we are using at all. I’m just using Twitter’s, because they were the first examples that were published and they made sense to me. An explanation of the options is here: http://railsbench.rubyforge.org/svn/tags/railsbench-0.8.3/GCPATCH

Here is my ruby-wrapper script:

#!/bin/sh

export RUBY_HEAP_MIN_SLOTS=500000 
export RUBY_HEAP_SLOTS_INCREMENT=250000 
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 
export RUBY_GC_MALLOC_LIMIT=50000000 

exec "/opt/ruby-enterprise-1.8.6-20090201/bin/ruby" "$@"