Most of the stuff that happens during a week at work is too long for Twitter and too short for a blog post. There are still things that I’d like to share, so I thought I’d try batching them up.
New Database Server
We made our first hardware purchase of 2009 - a new database server from Silicon Mechanics. I chose a 1U Intel Nehalem machine and filled it with 2.5″ 15K SAS disks.
I’m really excited to see how this will stack up against our current hardware (which is 18 months old, on average) I’ll be sure to run some benchmarks and post them. I also want to compare Xen to KVM to no virtualization. We started using Xen over two years ago and it looks like KVM may be pulling ahead performance-wise.
We’ve grown a lot during the last year and faster disks and more memory in the master database will give us a lot more breathing room. It will also allow us to turn our current master DB (which is still a nice machine) into a slave that can actually be used to take on part of the load. Our current slave is great for backup and feeding the Sphinx search engine but its SATA disks are just too slow to handle any of the site’s database traffic.
While I was looking at servers, I noticed that nice Intel SSDs are pretty affordable for machines that don’t have big storage needs (app servers, etc) or database/file server IO needs. ..especially when you compare them to RAIDing 2 hard drives for redundancy. I didn’t really understand why SSDs varied so much in price and performance until I read this LWN article: “Log Structured File systems: There’s one in every SSD“. Check it out.
Redis
We have our own ad-serving system at Ravelry - it allows advertisers to reserve their spots, target certain locations or groups, and change ad images on the fly. We like to think that our ads are interesting and cool enough that people actually want to return to one if they catch it as they leaving a page.
A long time ago, we had a feature that allowed people to “rewind” ads in order to bring back an ad that was previously displayed. We had to throw it out when it became too database intensive to make the feature work.
Thanks to Redis, I was able to bring this feature back with very little code. In one operation, I can refer to a list of viewed ads for a particular user/targeting and say “remove this ad from the list if it exists, push it to the tail, and trim the list to X items”. It’s really cool. The Redis page calls it “a data structures server” and I think that this is a very fitting description. Check out all of the operations: http://code.google.com/p/redis/wiki/CommandReference
If you are running Redis, here is a munin plugin that will graph connections and memory usage. (direct link to the script)
Amazon S3 sticker shock
Our Amazon S3 bill is getting really high. Too high - we spend a small fortune on serving up images.
I’ve been beginning to front our S3 buckets with a caching nginx proxy connected to the cheaper of our two ISPs. That way, Amazon will still handle the storage but we can serve the most-accessed files from our own bandwidth.
Swapping back and forth is easy because we just have to alter the behavior of AssetTagHelper.
It’s too early to tell how much this will save us because I can’t relaly handle the traffic right now. Turns out that Amazon is taking care
of somewhere around 100 Mbps of file serving during peak times
Once the new database server is installed, we’ll be able to do some shuffling and repurpose an existing virtual machine as a cache.
I posted an answer over on StackOverflow that explains how to do this: “How to set up Nginx as a caching reverse proxy?“
I looked doing this with Varnish but we are already running nginx and it was dead easy to set this up.
Google analytics API
We started using the Google Analytics API to create statistics tables for some of our data so that search results and other features can use the pageview information to better rank things, detect “hot” pages, stuff like that.
The API is great. It’s a little slow and a little strange, but it gives you access to all of the data that you can get through the web interface. Thanks Google, for processing gigantic amounts of traffic data for me so that I can query it later.



Comments (6)
Hey Casey, thanks for sharing all this! Very interesting stuff. Are you fronting your S3 buckets with CloudFront? Just curious about advantages of proximity for image serving, vs. serving them right out of central Rav…
Redis seems to excel at the “I’m having a problem with this one aspect of my database and I need a surgical fix” type problem.
I’m struggling with it as a general DB solution, because of the everything-in-memory limitations.
S3 is a real blessing when it comes to disk space, but the bandwidth sure gets expensive.
I’ve been peaking at GA more lately with interest in rigging up some “dashboard” kind of apps. I need to dig deeper to let it know which of my users are paid subscribers, so I can do stats against that. I believe it can be done, but haven’t had the time to explore it.
I’ve been thinking of S3 as place to put things I want safe - so thinking of it as a canonical data store.
Then you can purchase cheap boxes on Serverbeach/Softlayer for the bandwidth, with the install being nginx or varnish pointing to S3. Adding more boxes becomes child’s play.
Have you played with http://www.trendly.com/ - it is by Avi Bryant (of DabbleDB)? It is interesting.
I’d be really interested to hear about the results of your DB experiments. We are using KVM on the NASA Nebula project but I’ve not had time to compare MySQL or Postgres on bare metal vs. not. We have seen KVM with Virt-IO being MUCH faster than XEN (on our 10Gbe network).
Keep posting!
@Tim White : Our internal testes show that CloudFront is about 14 times faster than Amazon S3 itself. If you want to serve your content fast you should certainly consider CloudFront. It is also cost efficient - only $0.170 per GB
Hey Casey The article about munin plugin for redis was tranlated to English. Please update the link to http://stanly.net.ua/en/monitoring-dlya-redis/