Caltech's Machine Learning – CS 156 by Prof Yaser Abu-Mostafa


Overview – Machine Learning Course

Overview Lecture of Caltech’s Machine Learning Course – CS 156 by Professor Yaser Abu-Mostafa.

Lecture 01 – The Learning Problem

The Learning Problem – Introduction; supervised, unsupervised, and reinforcement learning. Components of the learning problem.

Lecture 02 – Is Learning Feasible

Is Learning Feasible? – Can we generalize from a limited sample to the entire space? Relationship between in-sample and out-of-sample.

Lecture 03 – The Linear Model I

The Linear Model I – Linear classification and linear regression. Extending linear models through nonlinear transforms.

Lecture 04 – Error and Noise

Error and Noise – The principled choice of error measures. What happens when the target we want to learn is noisy.

Lecture 05 – Training Versus Testing

Training versus Testing – The difference between training and testing in mathematical terms. What makes a learning model able to generalize?

Lecture 06 – Theory of Generalization

Theory of Generalization – How an infinite model can learn from a finite sample. The most important theoretical result in machine learning.

Lecture 07 – The VC Dimension

The VC Dimension – A measure of what it takes a model to learn. Relationship to the number of parameters and degrees of freedom.

Lecture 08 – Bias-Variance Tradeoff

Bias-Variance Tradeoff – Breaking down the learning performance into competing quantities. The learning curves.

Lecture 09 – The Linear Model II

The Linear Model II – More about linear models. Logistic regression, maximum likelihood, and gradient descent.

Lecture 10 – Neural Networks

Neural Networks – A biologically inspired model. The efficient backpropagation learning algorithm. Hidden layers.

Lecture 11 – Overfitting

Overfitting – Fitting the data too well; fitting the noise. Deterministic noise versus stochastic noise.

Lecture 12 – Regularization

Regularization – Putting the brakes on fitting the noise. Hard and soft constraints. Augmented error and weight decay.

Lecture 13 – Validation

Validation – Taking a peek out of sample. Model selection and data contamination. Cross validation.

Lecture 14 – Support Vector Machines

Support Vector Machines – One of the most successful learning algorithms; getting a complex model at the price of a simple one.

Lecture 15 – Kernel Methods

Kernel Methods – Extending SVM to infinite-dimensional spaces using the kernel trick, and to non-separable data using soft margins.

Lecture 16 – Radial Basis Functions

Radial Basis Functions – An important learning model that connects several machine learning models and techniques.

Lecture 17 – Three Learning Principles

Three Learning Principles – Major pitfalls for machine learning practitioners; Occam’s razor, sampling bias, and data snooping.

Lecture 18 – Epilogue

Epilogue – The map of machine learning. Brief views of Bayesian learning and aggregation methods.


Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license,…

This lecture was recorded on April 3, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

View course materials in iTunes U Course App –… and on the course website –

Next articleNavigating 50 Years of Philosophical Debate with Maps
The "Artificial Intelligence" section of is now the new home for what was formerly known as It sets out to discover and explore the different aspects of Artificial Intelligence, then share them with all the curious minds out there in the universe. If you know of anything interesting, please don't hesitate to contact us!

Leave a Reply