This course introduces the basic concepts and mathematical tools that constitute the foundations of the theory of machine learning. Learning theory concerns questions such as: What kinds of guarantees can we prove about practical machine learning methods, and can we design algorithms achieving desired guarantees? Is Occam’s razor a good idea and what does that even mean? What can we say about the fundamental inherent difficulty of different types of learning problems?

To answer these questions, the course discusses fundamental concepts such as probably approximately correct (PAC) learnability, empirical risk minimization and VC theory, and then covers the main ML subfields, including supervised learning (linear classification and regression, SVM, and deep learning), unsupervised learning (clustering), and reinforcement learning.

The examination consists of two homework assignments and a take-home exam.

Course type:

  • AS elective
  • AI mandatory/elective. (Those in the AI-track who have not taken the mandatory course “Learning Theory and Reinforcement Learning” must select at least one of the courses “Learning Theory” or “Reinforcement learning”.)
  • Joint curriculum: elective
    • Prioritized* (AI)
    • Elective (JC/AS)

Time: Even years from 2022, Spring

Teachers: Cristian Rojas, Alexandre Proutiere (KTH)

Examiner: Benny Avelin (UU)

Basic eligibility. Recommended background: Multivariable analysis, Probability theory and statistics, and Numerical methods, basic course, or equivalent knowledge. The participants are assumed to have a background in mathematics corresponding to the contents of the WASP-course “Mathematics for Machine Learning”. Students who have not taken the course are recommended to check the textbook for the course in advance.

After passing the course, the student should be able to

  • Derive and apply the basic theoretical tools used in modern machine learning
  • Describe known performance guarantees for important machine learning algorithms

Topic 1. Introduction

Main types of learning: supervised, unsupervised and reinforcement learning, and their mathematical formalization (input and label spaces, hypothesis classes, loss function).

Topic 2. PAC framework and empirical risk minimization

Concept of Probably Approximately Correct (PAC) learnability. Oracle inequalities and bias-variance trade-off. Empirical Risk Minimization Principle. Overfitting and No-Free-Lunch Theorem. Uniform convergence.

Topic 3. Concentration inequalities

Asymptotic versus finite sample probability bounds. Markov, Chebyshev and Chernoff bounds. Sub-Gaussian random variables. Hoeffding’s Lemma and Inequality. Bounded difference (McDiarmid) inequality.

Topic 4. Vapnik-Chervonenkis (VC) Theory

PAC learnability of finite hypothesis classes. Shattering and VC dimension. Sauer-Shelah’s lemma. Rademacher complexity. Fundamental Theorem of PAC learning.

Topic 5. Linear classification and regression

Linear predictors. Linear classification. Perceptron algorithm. Application of VC theory to multilayer neural networks. Logistic and linear regression.

Topic 6. Regularization, stability and optimization

Regularized risk minimization. Algorithmic stability and its application to generalization bounds for regularized risk minimization. Algorithms for convex learning: gradient descent, sub-gradient descent and stochastic gradient descent.

Topic 7. Support vector machines and kernel methods

Introduction to SVM with hard and soft margins. Performance bounds of hard and soft-margin SVM. Learning algorithms for SVM. Kernel methods; linear separability using embeddings. Kernel trick and the representer theorem; admissible kernels.

Topic 8. Deep neural networks

Neural networks and representation theorems. Training neural nets using back propagation. Dropout as a regularization technique. Recent results about the loss surface and local minima of neural networks. Recent theoretical developments justifying deep learning.

Topic 9. Clustering. Cluster validation and algorithms

Performance metrics for clustering. State-of-the-art clustering algorithms. Cluster evaluation. K-means and its performance guarantees. The EM algorithm and its performance for Gaussian mixtures. Spectral clustering, random matrix theory and concentration.

Topic 10. Active learning, online optimization and sequential decision making

Introduction to bandit problems and reinforcement learning. Exploration-exploitation trade-off. Fundamental limits via the change-of-measure arguments. Examples of algorithms, and their guarantees. Best policy identification vs regret minimization.

Shalev-Shwartz and S. Ben-David, “Understanding Machine Learning: From theory to algorithms”, Cambridge Univ. Press, 2014.

The examination will consist of two peer-graded individual assignments, and a take-home exam.

Syllabus (Kursplan)

If you are not a student at KTH you must login via https://canvas.kth.se/login/canvas