Machine Learning: An Algorithmic Perspective (Chapman & Hall/Crc Machine Learning & Pattern Recognition

Machine Learning: An Algorithmic Perspective (Chapman & Hall/Crc Machine Learning & Pattern Recognition
Authors
Stephen Marsland
ISBN
1420067184
Published
01 Apr 2009
Purchase online
amazon.com

Traditional books on machine learning can be divided into two groups — those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.

Page 2 of 2
  1. Editorial Reviews
  2. Customer Reviews

Customer Reviews

M. Henri De Feraudy said
I've only had this book for a few days and my judgement is not definitive, but it looks very promising indeed. The author has an unassuming casual and helpful style.
It's certainly a lot more detailed that Tom Mitchell's book, covers a lot of material you wont find in Norvig and Russel and is much more appealing than Bishop's latest book.
I thought you might be interested in the table of contents since there is no "see inside" for this book. Here it is by OCR:

Prologue xv

1 Introduction 1
1.1 If Data Had Mass, the Earth Would Be a Black Hole 2
1.2 Learning 4
1.2.1 Machine Learning 5
1.3 Types of Machine Learning 6
1.4 Supervised Learning 7
1.4.1 Regression 8
1.4.2 Classification 9
1.5 The Brain and the Neuron 11
1.5.1 Hebb's Rule 12
1.5.2 McCulloch and Pitts Neurons 13
1.5.3 Limitations of the McCulloch and Pitt Neuronal Model 15
Further Reading 16

2 Linear Discriminants 17
2.1 Preliminaries 18
2.2 The Perceptron 19
2.2.1 The Learning Rate j 21
2.2.2 The Bias Input 22
2.2.3 The Perceptron Learning Algorithm 23
2.2.4 An Example of Perceptron Learning 24
2.2.5 Implementation 26
2.2.6 Testing the Network 31
2.3 Linear Separability 32
2.3.1 The Exclusive Or (XOR) Function 34
2.3.2 A Useful Insight 36
2.3.3 Another Example: The Pima Indian Dataset 37
2.4 Linear Regression 41
2.4.1 Linear Regression Examples 43
Further Reading 44
Practice Questions 45

3 The Multi-Layer Perceptron 47
3.1 Going Forwards 49
3.1.1 Biases 50
3.2 Going Backwards: Back-Propagation of Error 50
3.2.1 The Multi-Layer Perceptron Algorithm 54
3.2.2 Initialising the Weights 57
3.2.3 Different Output Activation Functions 58
3.2.4 Sequential and Batch Training 59
3.2.5 Local Minima 60
3.2.6 Picking Up Momentum 61
3.2.7 Other Improvements 62
3.3 The Multi-Layer Perceptron in Practice 63
3.3.1 Data Preparation 63
3.3.2 Amount of Training Data 63
3.3.3 Number of Hidden Layers 64
3.3.4 Generalisation and Overfitting 66
3.3.5 Training, Testing, and Validation 66
3.3.6 When to Stop Learning 68
3.3.7 Computing and Evaluating the Results 69
3.4 Examples of Using the MLP 70
3.4.1 A Regression Problem 70
3.4.2 Classification with the MLP 74
3.4.3 A Classification Example 75
3.4.4 Time-Series Prediction 77
3.4.5 Data Compression: The Auto-Associative Network . 80
3.5 Overview 83
3.6 Deriving Back-Propagation 84
3.6.1 The Network Output and the Error 84
3.6.2 The Error of the Network 85
3.6.3 A Suitable Activation Function 87
3.6.4 Back-Propagation of Error 88
Further Reading 90
Practice Questions 91
4 Radial Basis Functions and Splines 95
4.1 Concepts 95
4.1.1 Weight Space 95
4.1.2 Receptive Fields 97
4.2 The Radial Basis Function (RBF) Network 100
4.2.1 Training the RBF Network 103
4.3 The Curse of Dimensionality 106
4.4 Interpolation and Basis Functions 108
4.4.1 Bases and Basis Expansion 108
4.4.2 The Cubic Spline 112
4.4.3 Fitting the Spline to the Data 112

4.4.4 Smoothing Splines 113
4.4.5 Higher Dimensions 114
4.4.6 -Beyond the Bounds 116
Further Reading 116
Practice Questions 117
5 Support Vector Machines 119
5.1 Optimal Separation 120
5.2 Kernels 125
5.2.1 Example: XOR 128
5.2.2 Extensions to the Support Vector Machine 128
Further Reading 130
Practice Questions 131
6 Learning with Trees 133
6.1 Using Decision Trees 133
6.2 Constructing Decision Trees 134
6.2.1 Quick Aside: Entropy in Information Theory 135
6.2.2 ID3 136
6.2.3 Implementing Trees and Graphs in Python 139
6.2.4 Implementation of the Decision Tree 140
6.2.5 Dealing with Continuous Variables 143
6.2.6 Computational Complexity 143
6.3 Classification and Regression Trees (CART) 145
6.3.1 Gini Impurity 146
6.3.2 Regression in Trees 147
6.4 Classification Example 147
Further Reading 150
Practice Questions 151
7 Decision by Committee: Ensemble Learning 153
7.1 Boosting 154
7.1.1 AdaBoost 155
7.1.2 Stumping 160
7.2 Bagging 160
7.2.1 Subagging 162
7.3 Different Ways to Combine Classifiers 162
Further Reading 164
Practice Questions 165
8 Probability and Learning 167
8.1 Turning Data into Probabilities 167
8.1.1 Minimising Risk 171
8.1.2 The Naïve Bayes' Classifier 171
8.2 Some Basic Statistics 173

8.2.1 Averages 173
8.2.2 Variance and Covariance 174
8.2.3 The Gaussian 176
8.2.4 The Bias-Variance Tradeoff 177
8.3 Gaussian Mixture Models 178
8.3.1 The Expectation-Maximisation (EM) Algorithm . 179
8.4 Nearest Neighbour Methods 183
8.4.1 Nearest Neighbour Smoothing 185
8.4.2 Efficient Distance Computations: the KD-Tree . . 186
8.4.3 Distance Measures 190
Further Reading 192
Practice Questions 193
9 Unsupervised Learning 195
9.1 The k-Means Algorithm 196
9.1.1 Dealing with Noise 200
9.1.2 The k-Means Neural Network 200
9.1.3 Normalisation 202
9.1.4 A Better Weight Update Rule 203
9.1.5 Example: The Iris Dataset Again 204
9.1.6 Using Competitive Learning for Clustering 205
9.2 Vector Quantisation 206
9.3 The Self-Organising Feature Map 207
9.3.1 The SOM Algorithm 210
9.3.2 Neighbourhood Connections 211
9.3.3 Self-Organisation 214
9.3.4 Network Dimensionality and Boundary Conditions 214
9.3.5 Examples of Using the SOM 215
Further Reading 218
Practice Questions 220
10 Dimensionality Reduction 221
10.1 Linear Discriminant Analysis (LDA) 223
10.2 Principal Components Analysis (PCA) 226
10.2.1 Relation with the Multi-Layer Perceptron 231
10.2.2 Kernel PCA 232
10.3 Factor Analysis 234
10.4 Independent Components Analysis (ICA) 237
10.5 Locally Linear Embedding 239
10.6 Isomap 242
10.6.1 Multi-Dimensional Scaling (MDS) 242
Further Reading 245
Practice Questions 246

11 Optimisation and Search 247
11.1 Going Downhill 248
11.2 Least-Squares Optimisation 251
11.2.1 Taylor Expansion 251
11.2.2 The Levenberg-Marquardt Algorithm 252
11.3 Conjugate Gradients 257
11.3.1 Conjugate Gradients Example 260
11.4 Search: Three Basic Approaches 261
11.4.1 Exhaustive Search 261
11.4.2 Greedy Search 262
11.4.3 Hill Climbing 262
11.5 Exploitation and Exploration 264
11.6 Simulated Annealing 265
11.6.1 Comparison 266
Further Reading 267
Practice Questions 267
12 Evolutionary Learning 269
12.1 The Genetic Algorithm (GA) 270
12.1.1 String Representation 271
12.1.2 Evaluating Fitness 272
12.1.3 Population 273
12.1.4 Generating Offspring: Parent Selection 273
12.2 Generating Offspring: Genetic Operators 275
12.2.1 Crossover 275
12.2.2 Mutation 277
12.2.3 Elitism, Tournaments, and Niching 277
12.3 Using Genetic Algorithms 279
12.3.1 Map Colouring 279
12.3.2 Punctuated Equilibrium 281
12.3.3 Example: The Knapsack Problem 281
12.3.4 Example: The Four Peaks Problem 282
12.3.5 Limitations of the GA 284
12.3.6 Training Neural Networks with Genetic Algorithms . 285
12.4 Genetic Programming 285
12.5 Combining Sampling with Evolutionary Learning 286
Further Reading 289
Practice Questions 290
13 Reinforcement Learning 293
13.1 Overview 294
13.2 Example: Getting Lost 296
13.2.1 State and Action Spaces 298
13.2.2 Carrots and Sticks: the Reward Function 299
13.2.3 Discounting 300

13.2.4 Action Selection 301
13.2.5 Policy 302
13.3 Markov Decision Processes 302
13.3.1 The Markov Property 302
13.3.2 Probabilities in Markov Decision Processes 303
13.4 Values 305
13.5 Back on Holiday: Using Reinforcement Learning 309
13.6 The Difference between Sarsa and Q-Learning 310
13.7 Uses of Reinforcement Learning 311
Further Reading 312
Practice Questions 312
14 Markov Chain Monte Carlo (MCMC) Methods 315
14.1 Sampling 315
14.1.1 Random Numbers 316
14.1.2 Gaussian Random Numbers 317
14.2 Monte Carlo or Bust 319
14.3 The Proposal Distribution 320
14.4 Markov Chain Monte Carlo 325
14.4.1 Markov Chains 325
14.4.2 The Metropolis-Hastings Algorithm 326
14.4.3 Simulated Annealing (Again) 327
14.4.4 Gibbs Sampling 328
Further Reading 331
Practice Questions 332
15 Graphical Models 333
15.1 Bayesian Networks 335
15.1.1 Example: Exam Panic 335
15.1.2 Approximate Inference 339
15.1.3 Making Bayesian Networks 342
15.2 Markov Random Fields 344
15.3 Hidden Markov Models (HMMs) 347
15.3.1 The Forward Algorithm 349
15.3.2 The Viterbi Algorithm 352
15.3.3 The Baum-Welch or Forward-Backward Algorithm 353
15.4 Tracking Methods 356
15.4.1 The Kalman Filter 357
15.4.2 The Particle Filter 360
Further Reading 361
Practice Questions 362

16 Python 365
16.1 Installing Python and Other Packages 365
16.2 Getting Started 365
16.2.1 Python for MATLAB and R users 370
16.3 Code Basics 370
16.3.1 Writing and Importing Code 370
16.3.2 Control Flow 371
16.3.3 Functions 372
16.3.4 The doc String 373
16.3.5 map and lambda 373
16.3.6 Exceptions 374
16.3.7 Classes 374
16.4 Using NumPy and Matplotlib 375
16.4.1 Arrays 375
16.4.2 Random Numbers 379
16.4.3 Linear Algebra 379
16.4.4 Plotting 380
Further Reading 381
Practice Questions 382

John Matlock said
There are two or three things that I really like about this book.

First, so many books of this type seem to leave off the first 20 or so pages that should tell you what it is that they are trying to do. Instead of assuming that you know what Machine Learning is all about, this book has an initial chapter that explains in simple terms what we are trying to do here.

Second, instead us using some kind of psuedocode, the examples are written in a standard language, Python. Python is a free language in the open source community so students can get/use it without incurring the costs associated with some other languages. It is also intended to be very readable which makes the demonstration programs easier to understand. There is also a chapter on programming in Python.

Machine learning usually is put into the computer science department in universities, and as a result is usually taught to computer science students. In fact, machine learning also requires more mathematical background and more engineering background than most computer science students have. The approach used in this book is to discuss algorithms used in machine learning, but to do so by stressing how and why they work.

The author says that the book is suitable for undergraduate use. Yes, it is, but for the rather advanced undergraduate or even early graduate level student

Mohamad Ali said
I received the book in perfect condition in two days. I was extremely happy of my purchase.

John said
This is an good book on machine learning for students at the advanced
undergraduate or Masters level, or for self study, particularly if
some of the background math (eigenvectors, probability theory, etc)
is not already second nature.

Although I am now familiar with much of the math in this area and consider
myself to have intermediate knowledge of machine learning, I can still recall
my first attempts to learn some mathematical topics. At that time my approach
was to implement the ideas as computer programs and plot the results. This
book takes exactly that approach, with each topic being presented both
mathematically and in Python code using the new Numpy and Scipy libraries.
Numpy resembles Matlab and is sufficiently high level that the book code
examples read like pseudocode.

(Another thing I recall when I was first learning was the mistaken
belief that books are free from mistakes. I've since learned to
expect that every first edition is going to have some, and doubly so
for books with math and code examples. However the fact that many of the examples
in this book produce plots is reassuring.)

As mentioned I have only intermediate knowledge of machine learning, and
have no experience with some techniques. I learned regression trees
and ensemble learning from this book -- and then implemented an ensemble
tree classifier that has been quite successful at our company.

Some other strong books are the two Bishop books (Neural Networks for Pattern
Recognition; Pattern Recognition and Machine Learning),
Friedman/Hastie/Tibshirani (Elements of Statistical Learning) and
Duda/Hart/Stork (Pattern Classification). Of these, I think the first Bishop
book is the only other text suitable for a beginner, but it doesn't have the
explanation-by-programming approach and is also now a bit dated (Marsland
includes modern topics such as manifold learning, ensemble learning, and a bit
of graphical models). Friedman et al. is a good collection of algorithms,
including ones that are not presented in Marsland; it is a bit dry however.
The new Bishop is probably the deepest and best current text, but it is
probably most suited for PhD students. Duda et al would be a good book at a
Masters level though its coverage of modern techniques is more limited. Of
course these are just my impressions. Machine learning is a broad subject and
anyone using these algorithms will eventually want to refer to several of these books.
For example, the first Bishop covers the normalized flavor of radial basis
functions (a favorite technique for me), and each of the mentioned books has
their own strengths.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“If Java had true garbage collection, most programs would delete themselves upon execution.” - Robert Sewell