Interesting article on the High Scalability blog this week highlights the scale up architecture of Stackoverflow.com. If you’re not familiar with this site, it’s a question and answer site for programmers that takes a new, more social approach to technical Q&A than has ever been available on the internet. Due to its usability and relevance it has become one of the top 200 web sites in the world with over 95 million page views monthly.
While the High Scalability post highlights the scale up approach, it’s also important to note the places where Stackoverflow has scaled out. Here are a few highlights pulled directly from the post:
- Web Tier – Web farm with 3 dedicated web servers for stackoverflow.com
- Web Statistics – Web logs stored in a separate SQL Server database (inferred) – 20GB of logs generated per day
- One table created per day to store statistics
- Caching Layer (Redis) – Every request doesn’t hit the database. Sticky switches keep user requests on the same server and state can be served directly out of RAM cache
- Data – Stackoverflow.com is the largest of the sites in the Stack Exchange network and is hosted on its own database cluster in a single database, but other sites such as serverfault.com, and any of the stack exchange network sites are hosted on a separate cluster
- Chat server is hosted in the Oregon data center while the main stack overflow site is hosted in NY
George Beech, sysadmin for the Stackoverflow.com network notes in the original article that the data tier is only about 20% utilized leaving headroom for growth and unexpected spikes in volume.
To achieve high scalability, there is usually going to be a healthy mix of scaling up and out. The key is to avoid painting yourself into a corner which the stack exchange guys have a done a nice job of.