We use Amazon S3 to store all of our files (images, PDFs, etc) We do this because it is cost effective – building out our own redundant, distributed storage system would cost roughly the same as 1 year of S3 storage service in equipment purchases alone.
Paying Amazon to serve those files to end users? Not so cost effective.
For the last 5 months, we’ve been serving up files from a caching proxy server that only hits Amazon when necessary.
To illustrate the savings: in the last 30 days we’ve served up 2.7 billion requests for files that are stored on S3 and we’ve transferred a total of 24 terabytes of data.
Recurring monthly cost to pay Amazon to do this
- GET requests: $270
- Data transfer: $3100
Recurring cost to handle it ourselves:
Note that this bandwidth is from a single ISP – we’re not paying for any redundancy and that’s okay because we can always temporarily serve from S3 if there is a problem.
- Bandwidth (120 Mbps): $1200
- Cache misses (S3 costs): roughly $400
One-time costs to handle it ourselves:
- Old 1U server with 8 GB RAM: $0 but let’s say $1000
- 2 x SSD drives: $600
- 1U SuperMicro machine acting as pfSense gigabit router: $1000
- GigE fiber installation fees: $600
- Fiber to Cat 5e converter: $200
..so after 2 months we saved enough to cover the up front costs.
SSDs and pfSense helped us keep our up front costs down. SSDs are perfect for this sort of workload and super fast drives made it possible to to repurpose a single crusty old server for this task. pfSense is fantastic open source firewall software that enabled us to build a gigabit firewall/router for $1000 instead of getting totally ripped off by Cisco or someone else on gigabit speed hardware.
There are few disadvantages to this – if anything ever goes wrong with our hardware or connection, we just failover to S3 and stop saving money. Our app currently requires a restart to switch between Amazon and our own hosting (or some other CDN) but it’s a fairly fast and seamless process.
We use nginx to do the proxy caching. It’s pretty much zero-maintenance and the configuration was really simple and straightforward. How do you actually configure it? I posted an answer on this Stack Overflow question: How to set up Nginx as a caching reverse proxy?