Text Processing in Python

Text Processing in Python
Authors
David Mertz
ISBN
0321112547
Published
12 Jun 2003
Purchase online
amazon.com

Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing.

Page 2 of 2
  1. Editorial Reviews
  2. Customer Reviews

Customer Reviews

Sean Fritz said
This book is interesting, the field it covers is not one with many texts, so it's hard to do comparative analysis.

On it's strengths, this book is probably best suited for programmers that aren't afraid to learn advanced material. It covers in great detail everything you ever wanted to know about python string processing (and honestly probably a bit more). It has a very readable style, and overall is exceptionally informative. Examples are clear, pointed, and useful.

On it's weaknesses, some material (ie parsers) might be extremely dense and hard to understand if you don't have a CS or Linguistics degree. On the other hand, if you do understand it (and the explanation is pretty good), you will end up a much better programmer for it.

Overall, I'd recommend this book for professionals with theory background that need to do advanced python work. I'd also recommend it to people without theory background, but only if they're not afraid of getting their feet wet. People who are afraid of learning should probably avoid this book.

4 stars mostly because I'm not really sure how to evaluate this book.

James Stroud said
TPIP is an instant classic in that all you need to do is add a solid understanding of python and you can instantly appreciate its classic nature. Text processing is more fundamental to programming than programming itself. For instance, most of the programs a programmer will write will be written with text. So gaining proficiency in dealing with text is key to not only programming but probably every facet of one's experience with a computer.

In TPIP, David Mertz provides the reader with a set of tools for manipulating text in python. The book is organized by type of text processing activity. For example filters are presented from a functional perspective, searching text is presented in terms of regular expressions, etc. Relevant modules are presented with each type of processing task in a reference format.

The greatest value in the book is that it approaches a fundamental and important programming topic that most books would treat sparingly or dismiss outright. TPIP might be in league with Friedl's Mastering Regular Expressions in that it takes outwardly uninspiring topics, makes them interesting, and teaches them with pedagogical finesse. Somehow, Mertz inspires the reader to feel intelligent while presenting the topics in an accessible way. Even mxtexttools becomes comprehensible in TPIP.

TPIP, though, is not without it shortcomings, especially in organization. The review of python and functional programming are put in appendices and the reference material is interleaved with the text, giving the reader a somewhat disjointed feeling as he makes his way through the book. Better would have been to build the book up from a solid review of the python language, proceeding to a thorough treatment of functional programming in python, to then present the meat of the book, text processing, as a well-organized whole with sensible segue between the chapters. The reference material should be moved to the appendices for easy access.

Even if these organization problems are never fixed, one would be well served to study this fine volume.

Dale Wilson said
There is a lot of good stuff in this book, but the presentation is lousy.

The first chapter dives into functional programming using obscure and terse high order functions including nested lambda expressions. He never does provide a "mere mortal" explanation for how these functions work. I was able to figure it out, but then I've been programming for 35 years in 20+ languages.

As a learning experience it was valuable debugging exercise for me, but as something for a programmer who was just getting to know Python, I can't think of a greater turn off.

Python as a rule is easy to read and easy to write. This book manages to make it unnecessarily hard.

Start with another Python book (or two, or three) then come back to this one when you have a lot of time and patience to spend. As I said there *is* some worthwhile information in there.

R. Dlugy-Hegwer said
I'd second most of the positive statements given by other reviewers. To boot - the author's voice is clear and pleasant. He shares his knowledge as it is, without dumbing it down or condescending. The index is very useful when you want to get in, get the information, and get back to work. This book is a great read for anyone learning or using Python seriously.

Elizabeth H. Papageorge said
This book is not for everyone, but for "text processing", I know of nothing else that comes close; this book merits careful study. Note that "text processing" would include many web applications -- http is a text driven protocol. Do not be put off by the first chapter! It is the most abstract of any book I have read in decades. As the book says, you can skip it if it is a problem for you. As an illustration of how good this book is, I am now using regular expressions (selectively), and this was only possible with the help if this book! (If you do not even know what regular expressions are, you have not completed Text Processing 1.01.)

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Engineers are all basically high-functioning autistics who have no idea how normal people do stuff.” - Cory Doctorow