Introduction to Machine Learning With Student Presentations (PhD level) – Carnegie Mellon University [Playlist]



Machine learning studies the question “how can we build computer programs that automatically improve their performance through experience?” This includes learning to perform many types of tasks based on many types of experience. For example, it includes robots learning to better navigate based on experience gained by roaming their environments, medical decision aids that learn to predict which therapies work best for which diseases based on data mining of historical health records, and speech recognition systems that learn to better understand your speech based on experience listening to you.

This course is designed to give PhD students a thorough grounding in the methods, theory, mathematics and algorithms needed to do research and applications in machine learning. The topics of the course draw from machine learning, classical statistics, data mining, Bayesian statistics and information theory. Students entering the class with a pre-existing working knowledge of probability, statistics and algorithms will be at an advantage, but the class has been designed so that anyone with a strong numerate background can catch up and fully participate.


  • Basic probability and statistics are a plus.
  • Basic linear algebra (matrices, vectors, eigenvalues) is a plus. Knowing functional analysis would be great but not required.
  • Ability to write code that exceeds ‘Hello World’. Preferably beyond Matlab or R.
  • Basic knowledge of optimization. Having attended a convex optimization class would be great but the recitations will cover this.
  • You should have no trouble answering the questions of the self evaluation handed out for the 10-601 course.


For specific videos of the class go to the individual lectures in the schedule below. This is also where you’ll find pointers to further reading material etc.


Geoffrey J. Gordon and Alex Smola

Teaching Assistants

Carlton DowneyAhmed HefnyDougal SutherlandLeila Wehbe, and Jing Xiang


Unit 1. Introduction

  • Machine Learning Problems
    • Classification, Regression, Annotation
    • Forecasting
    • Novelty detection
  • Data
    • Labeled, unlabeled
    • Semi-supervised, transductive, responsive environment, covariate shift
  • Applications
    • Optical character recognition
    • Bioinformatics
    • Computational advertising
    • Self-driving cars
    • Network security

Unit 2: Basic Tools

  • Linear regression
    • Optimization problem
    • Examples
    • Overfitting
  • Parzen windows
    • Basic idea (smoothing over empirical average)
    • Kernels
  • Model selection
    • Overfitting and underfitting
    • Crossvalidation and leave-one-out estimation
    • Bias-variance tradeoff
    • Curse of dimensionality
  • Watson-Nadaraya estimators
    • Regression
    • Classification
  • Nearest neighbor estimator
    • Limit case via Parzen
    • Fast lookup

Slides available in PDF.

Unit 3: Naive Bayes

    • Bayes Rule
    • Multiple testing
    • Discrete attributes
    • Continuous random variables

Slides available in 3a and 3b. Annotated versions are 3a and 3b.

Unit 4: Perceptron

  • Application – Hebbian learning
  • Perceptron
    • Algorithm
    • Convergence proof
    • Properties
  • Kernel trick
    • Basic idea
    • Kernel Perceptron
    • Kernel expansion
  • Kernel examples

Slides available in PDF and Keynote. If you want to extract the equations from the slides you can do so by using LaTeXit, simply by dragging the equation images into it.

Unit 5: Optimization

  • Unconstrained problems
    • Gradient descent
    • Newton’s method
  • Convexity
    • Properties
    • Lagrange function
    • Wolfe dual
  • Batch methods
    • Distributed subgradient
    • Bundle methods
  • Online methods
    • Unconstrained subgradient

Slides in Keynote and PDF are here. If you want to extract the equations from the slides you can do so by using LaTeXit, simply by dragging the equation images into it.

Unit 6: Duality (Support Vector Classification)

Slides available: Lectures 9 (annotated) and 10 (annotated).

Unit 7: SVM (Support Vector Classification

  • Application – Optical Character Recognition
  • Support Vector Machines
    • Large Margin Classification
    • Optimization Problem
    • Dual Problem
  • Properties
    • Support Vectors
    • Support Vector expansion
  • Soft Margin Classifier
    • Noise tolerance
    • Optimization problem
    • Dual problem

Slides available in PDF.

Unit 8: Kernel Methods and Regularization

Slides available in PDF

Unit 9: Tail Bounds & Averages

Slides available in PDF


Student Presentations [Playlist]

(Video Source: Alex Smola. Note Source:

Leave a Reply