Programming Collective Intelligence: Building Smart Web 2.0 Applications

Programming Collective Intelligence: Building Smart Web 2.0 Applications
Authors
Toby Segaran
ISBN
0596529325
Published
16 Aug 2007
Purchase online
amazon.com

Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.

Page 2 of 2
  1. Editorial Reviews
  2. Customer Reviews

Customer Reviews

Nicholas Sardo said
I picked this book up at a local Barnes and Noble. While I am certainly not trained in some of the areas this book covered, I found them completely accessible. While it should be obvious from the title that someone new to programming would find this book an incredibly tough read, I'll state it for the record. If you are learning how to program, this book is worth purchasing and holding on to until your ready.

The whole idea of "Collective Intelligence" is an interesting one. Given the way things are changing every day, technology is growing, and the web is expanding, it only makes sense that ideas in this book, and elsewhere should be explored.

The author chose to use Python as the language to realize code for the different topics of the book. This certainly is not to say that they can only be coded in Python, but I would tend to agree with his choice. Python is a clear language that can be coded procedurally or objectively. If you don't "speak" Python, in many cases you can understand what is going on in the code.

For me though, this book wasn't about the code so much as the ideas. Data, data, everywhere.. now, how can we explore, extrapolate, quantify, and qualify that data? That is what I took as the essence of the book. It covers many different techniques to do this, and I found it all fascinating.

In my opinion, if you are into this kind of thing, this book is well worth it.

Giuseppe Turitto said
Back in school, few years ago (to many to remember). I had to study most of this concepts, and at the time they where to abstract to me, and the examples and exercises they where so simple that they weren't making sense in real life. After that I started to work in other kind of system's and projects that never had the chance to play around this concepts and see how to apply them in real life. Until now that I had the chance to read this book, and see how I can apply this ideas and concepts in real life and take advantage of this knowledge.

Digital Puer said
This book provides very good breadth on a number of subjects related to machine learning. The author covers unsupervised classification and prediction systems (e.g. Bayesian classification, neural networks, and support vector machines), supervised clustering (e.g. K-Means), and stochastic optimisation (e.g. simulated annealing, genetic algorithms, and genetic programming).

Although I already had some knowledge of genetic algorithms, I know next to little about machine learning in general (my dissertation topic wasn't anywhere close to this topic), and my previous attempts at reading the machine learning tomes by Bishop and Alpaydin were futile. This book was nearly perfect for me.

The book is well written and well organised. A typical chapter comprises a high-level description of the topic, some discussion of a Python implemention with some small examples and data set, and finally a 3-5 page hands-on example at the end where the implementation is run against data accessed from a commercial website. I personally found the introductory matter in each chapter to be the most interesting, and thankfully the author provides nice illustrations for all the topics.

The author saves the best for last: Chapter 12 provides a summary of all the topics he covered with relative strengths and weaknesses of each algorithm. The author gives an excellent recurring example of email spam filtering that he carries through this chapter's discussion on Bayesian classification, decision tree classification, and neural networks, thereby allowing the reader to see how each of these techniques handles the problem differently. The illustrated example of the neural network in itself was worth its weight in gold. When I finished the book, I realised this high-level overview in Chapter 12 was invaluable and well-positioned at the end, as it neatly covers all the topics and places them in context (assuming the reader had indeed read through the previous chapters). If the author is bored enough to read this review, I would recommend that he place a similar high-level overview in Chapter 1 to guide the reader.

Now, here are things I did not like about this book:

1. The author does not provide many references to related work, particularly with how the problems presented in the book could be solved in the absence of the techniques he presents. For example, in Chapter 8 on building price models, he states that small problems "can more easily be solved with traditional statistical techniques," but he does not say what those are or give any references. Furthermore, from my own work in genetic algorithms, I know that stochastic optimisation is not the end-all to optimisation problems; sometimes the same problem can be solved quickly and efficiently with, say, linear programming, dynamic programming, or an approximation algorithm. The author does not discuss such alternatives in his chapter on this topic.

2. There is little depth in this book, but I will not hold that against him since this book was intended for a general audience. I wanted to know HOW does a neural network do what it does, and WHY a support vector machine produces its result. Again, the author provides no references. I guess Wikipedia will have to be my next step, but assuming the author himself has read through relevant material, it would have been nice if could let us know what are important papers or books to read.

3. The terse Python examples were confusing, which was due to a combination of that language's horrid type system and the author's lack of comments. Each example is difficult to follow. What do I need to pass into the function, and what do I get back? Is it a scalar, an array, a map/multimap, a set/multiset, or what? The author should have provided better comments at the very least. Occasionally, I wondered if the reader would have been better served with examples presented in pseudocode (a la the algorithms shown in CLRS), but in the end I decided that having working code in Python outweighed issues in clarity. My recommendation for advanced readers: Read the Python as simply pseudocode and implement each algorithm in your favourite respectable programming language (which better be C++, Java, or C#). I learned much more that way since I had to carefully understand what each line was doing.

In summary: this is an excellent book on machine learning with much more interesting and advanced topics than other O'Reilly works. I hope O'Reilly will continue to produce similar books.

Samuel Moñux said
The book is interesting and easy to read. Shows how to apply AI concepts to the kind of applications that the majority of programmers produce, and for those who like me studied AI years ago but haven't used it a lot since then, it's a good reminder.

But, the quality of the Python code leaves a lot be desired. I'm sure it works, and for strict personal use it could be OK, but lacks of ellegance for a textbook; abuses of list comprehensions and long expressions(to make the code compact, I guess), which makes hard to follow the examples to the detail.

I don't regret having bought it, though.

Brian Cochran said
One of the best books I have bought in a while. It strikes a perfect balance of introduction of the algorithms and practical application. The book is organized around the different problem areas such as "search", "optimization", "categorizing", etc. and algorithms to achieve them. It starts each section with a naive implementation to a problem, and gradually works through to more intelligent solutions. I really enjoyed the evolution of the search implementation. It starts with a trivial implementation, and continues to augment adding such features as a simplified PageRank and other optimizations.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“In order to understand recursion, one must first understand recursion.”