A look behind the numbers on Digg and Stack Overflow

Joel Spolsky kicked off an old internet argument this week, proclaiming on Twitter: “Digg: 200MM page views, 500 servers. Stack Overflow: 60MM page views, 5 servers. What am I missing?”

Digg of course runs on an open-source technology stack. This includes PHP, Cassandra for persistent storage, Solr for search, Redis, MySQL and more. According to Arin Sarkissian, a former engineer at Digg, the 500 server stat is "probably something Kevin [Rose, of Digg] said that people ran with".

StackOverflow is based on .NET, and according to the Stack Overflow blog, it runs on 6 webservers, 2 proxy and caching servers, and 2 database servers.

Of course the argument is moot, there are a massive range of specific details for why the hardware requirements for each of these sites is completely different. For example, it is said the Digg number includes enough servers to run both Digg Version 4 which recently launched, as well as the old Version 3 simultaneously; meanwhile the StackOverflow numbers probably don't include the CDN and static content servers. However, this debate is interesting because it has brought to the surface details of how the developers have gone about making their sites so widely scalable and reliable on relatively low hardware budgets. It also goes to show that in the right hands, technology isn't the limiting factor when building a massively scalable site.

Digg makes use of a wide range and extremely interesting set of technology to bring about the data processing and serving requirements of their site. We have already mentioned Cassandra, the open-source branch of a database first developed at Facebook, and it is said they have 50 nodes in their cluster - which is over-provisioned "just in case". The new Digg's architecture is service-oriented (SOA, or Service Oriented Architecture), using Python and Java for the services. Solr, the Apache Foundation's open-source search server, and finally a Hadoop cluster, which performs data processing using the map-reduce paradigm.

There is an interesting discussion further into this on Reddit, and a Quora question with a set of answers from people loosely involved in the companies.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Every language has an optimization operator. In C++ that operator is ‘//’”