A new version of Cassandra, the open-source Apache distributed database, is now out. Version 0.7 brings a number of significant features and an even longer list of bug fixes to the software, which has only been available for 2 years.
Cassandra was originally written by two Facebook engineers to solve the problem of providing users with search of their inbox. Since then it was open-sourced, and became a top-level Apache project in February 2010. The software is written in Java, and designed to be scalable across many machines (the largest installation is over 150 machines), while providing robustness and reliability. It is used by Digg, Facebook, Twitter, Reddit, Rackspace, Cisco and many more important web companies.
Among the nearly 300 major and countless minor bug fixes since the first Alpha of version 0.7, the headline feature is the ability for the database to handle up to 2 billion columns in its database. This is by virtue of Cassandra’s approach to data storage: because you can’t execute queries like you would in a traditional SQL-based database, you need to store additional calculations in more columns against the row in order to search on them efficiently. Additionally, not all rows need to have all of the columns in the database, and Cassandra makes it extremely easy to add and remove columns at will and on the fly; these combine to make the large number available a powerful feature.