Worklog - New server, SSD, Redis, Amazon S3 bills, Google Analytics API

Most of the stuff that happens during a week at work is too long for Twitter and too short for a blog post. There are still things that I’d like to share, so I thought I’d try batching them up.

New Database Server

We made our first hardware purchase of 2009 - a new database server from Silicon Mechanics. I chose a 1U Intel Nehalem machine and filled it with 2.5″ 15K SAS disks.

I’m really excited to see how this will stack up against our current hardware (which is 18 months old, on average) I’ll be sure to run some benchmarks and post them. I also want to compare Xen to KVM to no virtualization. We started using Xen over two years ago and it looks like KVM may be pulling ahead performance-wise.

We’ve grown a lot during the last year and faster disks and more memory in the master database will give us a lot more breathing room. It will also allow us to turn our current master DB (which is still a nice machine) into a slave that can actually be used to take on part of the load. Our current slave is great for backup and feeding the Sphinx search engine but its SATA disks are just too slow to handle any of the site’s database traffic.

While I was looking at servers, I noticed that nice Intel SSDs are pretty affordable for machines that don’t have big storage needs (app servers, etc) or database/file server IO needs. ..especially when you compare them to RAIDing 2 hard drives for redundancy. I didn’t really understand why SSDs varied so much in price and performance until I read this LWN article: “Log Structured File systems: There’s one in every SSD“. Check it out.

Redis

We have our own ad-serving system at Ravelry - it allows advertisers to reserve their spots, target certain locations or groups, and change ad images on the fly. We like to think that our ads are interesting and cool enough that people actually want to return to one if they catch it as they leaving a page.

A long time ago, we had a feature that allowed people to “rewind” ads in order to bring back an ad that was previously displayed. We had to throw it out when it became too database intensive to make the feature work.

Thanks to Redis, I was able to bring this feature back with very little code. In one operation, I can refer to a list of viewed ads for a particular user/targeting and say “remove this ad from the list if it exists, push it to the tail, and trim the list to X items”. It’s really cool. The Redis page calls it “a data structures server” and I think that this is a very fitting description. Check out all of the operations: http://code.google.com/p/redis/wiki/CommandReference

If you are running Redis, here is a munin plugin that will graph connections and memory usage. (direct link to the script)

Amazon S3 sticker shock

Our Amazon S3 bill is getting really high. Too high - we spend a small fortune on serving up images.

I’ve been beginning to front our S3 buckets with a caching nginx proxy connected to the cheaper of our two ISPs. That way, Amazon will still handle the storage but we can serve the most-accessed files from our own bandwidth.

Swapping back and forth is easy because we just have to alter the behavior of AssetTagHelper.

It’s too early to tell how much this will save us because I can’t relaly handle the traffic right now. Turns out that Amazon is taking care of somewhere around 100 Mbps of file serving during peak times :) Once the new database server is installed, we’ll be able to do some shuffling and repurpose an existing virtual machine as a cache.

I posted an answer over on StackOverflow that explains how to do this: “How to set up Nginx as a caching reverse proxy?

I looked doing this with Varnish but we are already running nginx and it was dead easy to set this up.

Google analytics API

We started using the Google Analytics API to create statistics tables for some of our data so that search results and other features can use the pageview information to better rank things, detect “hot” pages, stuff like that.

The API is great. It’s a little slow and a little strange, but it gives you access to all of the data that you can get through the web interface. Thanks Google, for processing gigantic amounts of traffic data for me so that I can query it later.

Comments (9)

  1. Tim White wrote:

    Hey Casey, thanks for sharing all this! Very interesting stuff. Are you fronting your S3 buckets with CloudFront? Just curious about advantages of proximity for image serving, vs. serving them right out of central Rav…

    Friday, November 6, 2009 at 4:17 pm #
  2. Randito wrote:

    Redis seems to excel at the “I’m having a problem with this one aspect of my database and I need a surgical fix” type problem.

    I’m struggling with it as a general DB solution, because of the everything-in-memory limitations.

    Friday, November 6, 2009 at 6:23 pm #
  3. Jeff Putz wrote:

    S3 is a real blessing when it comes to disk space, but the bandwidth sure gets expensive.

    I’ve been peaking at GA more lately with interest in rigging up some “dashboard” kind of apps. I need to dig deeper to let it know which of my users are paid subscribers, so I can do stats against that. I believe it can be done, but haven’t had the time to explore it.

    Friday, November 6, 2009 at 10:42 pm #
  4. I’ve been thinking of S3 as place to put things I want safe - so thinking of it as a canonical data store.

    Then you can purchase cheap boxes on Serverbeach/Softlayer for the bandwidth, with the install being nginx or varnish pointing to S3. Adding more boxes becomes child’s play.

    Have you played with http://www.trendly.com/ - it is by Avi Bryant (of DabbleDB)? It is interesting.

    I’d be really interested to hear about the results of your DB experiments. We are using KVM on the NASA Nebula project but I’ve not had time to compare MySQL or Postgres on bare metal vs. not. We have seen KVM with Virt-IO being MUCH faster than XEN (on our 10Gbe network).

    Keep posting!

    Saturday, November 7, 2009 at 2:05 am #
  5. Andy wrote:

    @Tim White : Our internal testes show that CloudFront is about 14 times faster than Amazon S3 itself. If you want to serve your content fast you should certainly consider CloudFront. It is also cost efficient - only $0.170 per GB

    Saturday, November 14, 2009 at 2:16 am #
  6. Stanly wrote:

    Hey Casey The article about munin plugin for redis was tranlated to English. Please update the link to http://stanly.net.ua/en/monitoring-dlya-redis/

    Thursday, November 19, 2009 at 11:50 am #
  7. Janet wrote:

    Wow, so interesting. I feel my comments on groups take up lots of room. I have a copy of each on my Rav home page. Is there a way to delete these? Do monitors of these groups delete older data to free up space? I am sure the photos take up the most space… photos are so helpful with knitting!

    Thanks again, Janet -Central Sq. Camb

    Thursday, December 10, 2009 at 4:55 pm #
  8. EP wrote:

    Re: S3 Costs…

    Why are you spending anything on image serving? Are you in the image serving business? If you aren’t then you are wasting your money.

    Flickr is free. Picasa is free. They provide the services, edge servers, bandwidth and storage for nothing. They both offer APIs which allow your userbase to store and manage image libraries at zero cost to you and with zero maintenance on your part.

    I recommend you rethink whether the wheel needs reinvention and whether you should be the one paying for it.

    Cheers and good luck

    Thursday, January 7, 2010 at 9:43 am #
  9. casey wrote:

    Hi EP,

    We do integrate with both Flickr and Photobucket’s APIs and many (more than half) of our users choose to use those services. In those cases, we don’t serve the images.

    I’d argue that we should be in the image serving business. One of the most important (if not the most important) feature of our site is viewing images. We do serve our own images for pages that don’t belong to users as well as user-created images for people who are unable or unwilling to use external services. If it became necessary, I suppose we could charge people who wanted the upload option, but for now we provide it for free.

    Tuesday, January 19, 2010 at 3:15 pm #