   Tom M.Mitchell是卡内基梅隆大学教授,目前担任该校自动学习和发现中心主任。他还是美国人工智能协会(AAAI)的主席,并且是《Machine Learning》杂志和国际机器学习会议(ICML)的创办者。
1 introduction
1.1 well-posed learning problems
1.2 designing a learning system
1.2.1 choosing the training experience
1.2.2 choosing the target function
1.2.3 choosing a representation for the target function
1.2.4 choosing a function approximation algorithm
1.2.5 the final design
1.3 perspectives and issues in machine learning
1.3.1 issues in machine learning
1.4 how to read this book
1.5 summary and further reading
2 concept learning and the general-to-specific ordering
2.1 introduction
2.2 a concept learning task
2.2.1 notation

.2.2.2 the inductive learning hypothesis
2.3 concept learning as search
2.3.1 general-to-specific ordering of hypotheses
2.4 find-s: finding a maximally specific hypothesis
2.5 version spaces and the candidate-elimination
2.5.1 representation
2.5.2 the list-then-eliminate algorithm
2.5.3 a more compact representation for version spaces
2.5.4 candidate-elimination lsarning algorithm
2.5.5 ar illustrative example
2.6 remarks on version spaces and canoidate-elimination
2.6.1 will the candidate-elimination algorithm
converge to the correct hypothesis?
2.6.2 what training example should the leamer request
2.6.3 how can panially leamed concepts be used?
2.7 inductive bias
2.7.1 a biased hypothesis space
2.7.2 an unbiased learner
2.7.3 the futility of bias-free learning
2.8 summary and further reading
3 decision tree learning
3.1 introduction
3.2 decision tree representation
3.3 appropriate problems for decision tree learning
3.4 the basic decision tree leaming algorithm
3.4.1 which attribute is the best classifier?
3.4.2 an illustrative example
3.5 hypothesis space search in decision tree leaming
3.6 inductive bias in decision tree learning
3.6.1 restriction biases and preference biases
3.6.2 why prefer shon hypotheses?
3.7 issues in decision tree lsarning
3.7.1 avoiding overfitting the data
3.7.2 incorporating continuous-valued attributes
3.7.3 alternative measures for selecting attributes
3.7.4 handling training examples witll missing attribute
3.7.5 handling attributes with differing costs
3.8 summary and further reading
4 artificial neural networks
4.1 introduction
4.1.1 biological motivation
4.2 neural network representations
4.3 appropriate ptoblems for neural network learning
4.4 perceptrons
4.4.1 represenational power of perceptrons
4.4.2 the perceptron training rule
4.4.3 gradient descent and the delta rule
4.4.4 remarks
4.5 multilayer networks and the backpropaoation algorithm
4.5.1 a differentiable threshold unit
4.5.2 the backpropagauon algorithm
4.5.3 derivation of the backpropagation rule
4.6 remarks on the backpropagation algorithm
4.6.1 convergence and local minima
4.6.2 representational power of feedforward networks
4.6.3 hypothesis space search and inductive bias
4.6.4 hidden layer representations
4.6.5 generalization, overfitting, and stopping criterion
4.7 an illusuative example: face recognition
4.7.1 the task
4.7.2 design choices
4.7.3 lsarned hidden representations
4.8 advanced topics in artificial neural networks
4.8.1 altemative error functions
4.8.2 altemative error minimization procedures
4.8.3 recument networks
4.8.4 dynamically modifying network structure
4.9 summary and further reading
5 evaluating hypotheses
5.1 motivation
5.2 estimating hypothesis accuracy
5.2.1 sample error and true error
5.2.2 confidence intervals for discrete-valued hypotheses
5.3 basics of sampling theory
5.3.1 error estimation and estimating binomial proportions
5.3.2 the binomial distribution
5.3.3 mean and variance
5.3.4 estimators, bias, altd variance
5.3.5 confidence intervals
5.3.6 two-sided and one-sided bounds
5.4 a general approach for deriving confidence intervals
5.4.1 central limit theorem
5.5 difference in error of two hypotheses
5.5.1 hypothesis testing
5.6 comparing learning algorithms
5.6.1 paired t tests
5.6.2 practical considerations
5.7 summary and further reading
6 bayesian learning
6.1 introduction
6.2 bayes theorem
6.2.1 an example
6.3 bayes theorem and concept learning
6.3.1 brute-force bayes concept learning
6.3.2 map hypotheses and consistent lsarners
6.4 maximum likelihood and least-squared error hypotheses
6.5 maximum likelihood hypotheses for predicting probabilities
6.5.1 gradient search to maximize likelihood in a neural net
6.6 minimum description length principle
6.7 bayes optimal classifier
6.8 gibbs algorithm
6.9 naive bayes classifier
6.9.1 an illustrative example
6.10 an example: learning to classify text
6.10.1 experimental results
6.11 bayesian belief networks
6.11.1 conditional independence
6.11.2 representation
6.11.3 inference
6.11.4 leaming bayesian belief networks
6.11.5 gradient ascent training of bayesian networks
6.11.6 leanling the suucture of bayesian networks
6.12 the em algorithm
6.12.1 estimating means of k gaussians
6.12.2 oeneral statement of em algorithm
6.12.3 derivation of the k means algorithm
6.13 summary and further reading
7 computational leaming theory
7.1 introduction
7.2 probably learning an approximately correct hypothesis
7.2.1 the problem setting
7.2.2 error of a hypothesis
7.2.3 pac leamability
7.3 sample complexity for finite hypothesis spaces
7.3.1 agnostic leaming and inconsistent hypotheses
7.3.2 conjunctions of boolean literals are pac-learnable
7.3.3 pac-learnability of other concept classes
7.4 sample complexity for infinite hypothesis spaces
7.4.1 shattering a set of instances
7.4.2 the vapnik-chervonenkis dimension
7.4.3 sample complexity and the vc dimension
7.4.4 vc dimension for neural networks
7.5 the mistake bound model of learning
7.5.1 mistake bound for the find-s algorithm
7.5.2 mistake bound for the halving algorithm
7.5.3 optimal mistake bounds
7.5.4 weighted-majority algorithm
7.6 summary and further reading
8 instance-based learning
8.1 introduction
8.2 k-nearest neighbor learning
8.2.1 distance-weighted nearest neighbor algorithm
8.2.2 remarks on k-nearest neighbor algorithm
8.2.3 a note on terminology
8.3 locally weighted regression
8.3.1 locally weighted linear regression
8.3.2 remarks on locally weighted regression
8.4 radial basis functions
8.5 case-based reasoning
8.6 remarks on lazy and eager learning
8.7 summary an
9 genetic algorithms
9.1 modvadon
9.2 genetic algorithms
9.2.1 representing hypotheses
9.2.2 genetic operators
9.2.3 fialess function and selection
9.3 an illusuative example
9.3.1 extensions
9.4 hypothesis space search
9.4.1 population evolution and the schema theorem
9.5 oenetic programming
9.5.1 representing programs
9.5.2 illustrative example
9.5.3 remarks on genetic programming
9.6 models of evolution and learning
9.6.1 lamarckian evolution
9.6.2 baldwin effect
9.7 parallelizing genetic algorithms
9.8 summary and furaler reading
10 learning sets of rules
10.1 introduction
10.2 sequential covering algorithms
10.2.1 general to specific beam search
10.2.2 variations
10.3 learning rule sets: summary
10.4 learning first-order rules
10.4.1 first-order horn clauses
10.4.2 terminology
10.5 learning sets of first-order rules: foil
10.5.1 generating candidate specializations in foil
10.5.2 guiding the search in foil
10.5.3 learning recursive rule sets
10.5.4 summary of foil
10.e induction as invened deduction
10.7 inverting resolution
10.7.1 first-order resolution
10.7.2 inverting resolution: first-order case
10.7.3 summary of inverse resolution
10.7.4 generalization, 0-subsumption, and entailment
10.7.5 progol
10.8 summary and further reading
11 analytical leaming
11.1 introduction
11.1.1 inductive and analytical leaming problems
11.2 learning with perfect domain theories: prolog-ebg
11.2.1 an illustrative trace
11.3 remarks on explanation-based learning
11.3.1 discovering new features
11.3.2 deductive learning
11.3.3 inductive bias in explanation-based learning
11.3.4 knowledge level learning
11.4 explanation-based learning of search control knowledge
11.5 summary and further reading
12 combining inductive and analytical learning
12.1 motivation
12.2 inductive-analytical approaches to learning
12.2.1 the learning problem
12.2.2 hypothesis space search
12.3 using prior knowledge to lnitialize the hypothesis
12.3.1 the kbann algorithm
12.3.2 an illustrative example
12.3.3 remarks
12.4 using prior knowledge to alter the search objective
12.4.1 the tangentprop algorithm
12.4.2 an illustrative example
12.4.3 remarks
12.4.4 the ebnn algorithm
12.4.5 remarks
12.5 using prior knowledge to augment search
12.5.1 the focl algorithm
12.5.2 remarks
12.6 state of the art
12.7 summary and further reading
13 reinforcement learning
13.1 introduction
13.2 the learning task
13.3 q learning
13.3.1 the q function
13.3.2 an algorithm for learning q
13.3.3 an illustrative example
13.3.4 convergence
13.3.5 experimentation strategies
13.3.6 updating sequence
13.4 nondeterministic rewards and actions
13.5 temporal difference learning
13.6 oeneralizing from examples
13.7 relationship to dynamic pro
13.8 summary and further reading
appendix notation
author index
subject index


   The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. In recent years many successful machine learning applications have been developed, ranging from data-mining programs that learn to detect fraudulent credit card transactions, to information-filtering systems that learn users’ reading preferences, to autonomous vehicles that leant to drive on public highways. At the same time, there have been important advances in the theory and algorithms that form the foundations of this field.
   The goal of this textbook is to present the key algorithms and theory that form the core of machine learning. Machine learning draws on concepts and results from many fields, including statistics, artificial intelligence, philosophy,information theory, biology, cognitive science, computational complexity, and control theory. My belief is that the best way to learn about machine learning is to view it from all of these perspectives and to understand the problem settings,algorithms. and assumptions that underlie each. In the past, this has been difficult due to the absence of a broad-based single source introduction to the field. The primary goal of this book is to provide such an introduction.
   Because of the interdisciplinary nature of the material, this book makes few assumptions about the background of the reader. Instead, it introduces basic concepts from statistics, artificial intelligence, information theory, and other disciplines as ale need arises, focusing on just those concepts most relevant to machine learning. The book is intended for both undergraduate and graduate students in fields such as computer science, engineering, statistics, and the social sciences,and as a reference for software professionals and practitioners. Two principles that guided the writing of the book were that it should be accessible to undergraduate students and that it should contain the material I would want my own Ph.D.students to learn before beginning their doctoral research in machine learning.
   A third principle that guided the writing of this book was that it should present a balance of theory and practice. Machine learning theory attempts to answer questions such as “How does learning performance vary with the number of training examples presented?” and “Which learning algorithms are most appropriate for various types of learning tasks?” This book includes discussions of these and other theoretical issues, drawing on theoretical consvucts from statistics, computational complexity, and Bayesian analysis. The practice of machine learning is covered by presenting the major algorithms in the field, along with illustrative traces of their operation. Online data sets and implementations of several algorithms are available via the World Wide Web at http://www.cs.cmu.edu/-tom/mlbook.html. These include neural network code and data for face recognition,decision vee leaming code and data for financial loan analysis, and Bayes classifier code and data for analyzing text documents. I am grateful to a number of colleagues who have helped to create these online resources, including Jason Rennie, Paul Hsiung, Jeff Shufelt, Man Glickman, Scott Davies, Joseph O’Sullivan,Ken Lang, Andrew McCallum,and Thorsten Joachims.
   In writing this book, I have been fortunate to be assisted by technical experts in many of the subdisciplines that make up the field of machine learning. This book could not have been written without their help. I am deeply indebted to, the following scientists who took the time to review chapter drafts and, in manycases, to tutor me and help organize chapters in their individual areas of expenise.
   Avrim Blum, Jaime Carbonell, William Cohen, Greg Cooper, Mark Craven,Ken DeJong, Jerry DeJong, Tom Dietterich, Susan Epstein, Oren Etzioni,Scou Fahlman, Stephanie Forrest, David Haussler, Haym Hirsh, Rob Holte,Leslie Pack Kaelbling, Dennis Kibler, Moshe Koppel, John Koza, Miroslav Kubat, John Laffeny, Ramon Lopez de Mantaras, Sridhar Mahadevan, Stan Matwin, Andrew McCallum, Raymond Mooney, Andrew Moore, Katharina Morik, Steve Muggleton, Michael Pazzani, David Poole, Armand Prieditis,Jim Reggia, Stuart Russell, Lorenza Saitta, Claude Sammut,Jeff Schneider,Jude Shavlik, Devika Subramanian, Michael Swain, Gheorgh Tecuci, Sebastian Thrun, Peter Tumey, Paul Utgoff, Manuela Veloso, Alex Waibel,Stefan Wrobel, and Yiming Yang.
   I am also grateful to dle many instructors and students at various universities who have field tested various drafts of this book and who have contributed their suggestions. Although there is no space to thank the hundreds of students.
   instructors, and others who tested earlier drafts of this book, I would like to thank the following for particularly helpful comments and discussions:
   Shumeet Baluja, Andrew Banas, Andy Barto, Jim Blackson, Justin Boyan,
   Rich Caruana, Philip Chan, Jonathan Cheyer, Lonnie Chrisman, Dayne Freitag, Geoff Gordon, Warren Greiff, Alexander Harm, Tom loerger, Thorsten Joachim, Atsushi Kawamura, Martina Klose, Sven Koenig, Jay Modi, Andrew Ng, Joseph O’Sullivan, Patrawadee Prasangsit, Doina precup. Bob
   price, Choon Quek, Sean Slanery, Belinda Thom, Astro Teller, Will Tracz
   I would Hke to thank Joan Mitchell for creating ale index for the book. I also would like to thank Jean Harpley for help in editing many of the figures.Jane Lonus from ETP Harrison improved the presentation significantly through her copyediting of the manuscript and generally helped usher the manuscript through the intricacies of final production. Eric Munson, my editor at McGraw Hill, provided encouragement and expertise in all phases of alis project.
   As always, the greatest debt one owes is to one’s colleagues, friends, and family. In my case. this debt is especially large. I can hardly imagine a more intellecnlally stimulating environment and supponive set of fuiends than alose I have at Camegie Mellon. Among the many here who helped, I would especially like to alank Sebastian Thrun, who throughout alis project was a constant source of encouragement, technical expenise, and suppoH of all kinds. My parents, as always, encomaged and asked “Is it done yet?” at just the right times. Finally, I must thank my family: Meghan, Shannon, and Joan. They are responsible for this book in more ways alan even they know. This book is dedicated to them.
   Tom M. Mitchell


