Reinforcement Learning (RL) is one of the main branches of Machine Learning allowing a system to learn through a trial-and-error process. The emerging field of RL has led to impressive results in varied domains like strategy games, robotics, etc. This course aims to give a deep understanding of the wide area of RL, build the basic theoretical foundation, and summarize state-of-the-art RL algorithms. The course covers topics from sequential decision making, probability theory, optimization, and control theory. At the end of this course, students will be able to formalize a task as an RL problem, have practical skills to implement recent advances in RL, and are ready to contribute to this field.

Course type:

  • AS track: elective
  • AI track*: prioritized (elective)
  • Joint curriculum: advanced

Time: Given even years, Autumn

Teachers: Farnaz Adib Yaghmaie (LiU), Fredrik Heintz (LiU), Johannes Andreas Stork (ORU)

Examiner: Johannes Andreas Stork (ORU)

*Those in the AI-track, who have not taken the mandatory course “Learning Theory and Reinforcement Learning” must select at least one of the courses “Learning Theory” or “Reinforcement learning”. Those in the AI-track, who have taken the course “Learning Theory and Reinforcement Learning” may also take “Reinforcement learning” as one of their elective courses.

The participants are assumed to have a background in mathematics corresponding to the contents of the WASP-course “Mathematics and Machine Learning”. Other entry requirements for the course:

  • Probability theory (estimation, Monte Carlo, etc.)
  • Optimization (mean-squares error, categorical cross entropy loss, dynamic programming)
  • Deep learning basics (back-propagation, fully connected, convolutional layers)
  • Programming in Python (Numpy, plotting, deep learning with Python, etc.)
  • Basic knowledge about RL (MDP, tabular RL, value iteration, policy search, Q-learning, SARSA, etc.) corresponding to module 1 of this course, in case module combination 2+3+4 is selected instead of module combination 1+2+3.

Knowledge and Understanding

After completed studies, the student shall be able to

  • explain and characterize the concept of RL and categorize RL agents,
  • describe, explain, compare, and characterize different types of basic and advanced reinforcement learning methods,
  • derive from first principles and explain what the underlying mathematical principles of these reinforcement learning methods are, and
  • restate a control problem as an RL problem.

Competences and Skills

After completed studies, the student shall be able to

  • analyze and compare results of reinforcement learning methods,
  • implement (relevant parts of) advanced reinforcement learning algorithms,
  • apply advanced reinforcement learning algorithms,
  • read and critically review scientific publications about reinforcement learning,
  • use established software, frameworks, and libraries to implement (relevant parts of) RL algorithms and environments.

Judgment and Approach

After completed studies, the student shall be able to

  • discuss and reflect on important and advanced concepts in reinforcement learning,
  • discuss and reflect on what influences the performance of these methods,
  • discuss and reflect on when which of these methods applies to a given scenario or problem,
  • discuss and reflect on scientific publications about reinforcement learning,
  • propose extensions and modifications to improve the performance of an RL algorithm for a specific problem.

The course is organized into 4 sequential modules that build on each other. Students take 3 of these modules in sequence, i.e. either module combination 1+2+3 or module combination 2+3+4. Combination 1+2+3 is for students without prior knowledge of RL while combination 2+3+4 is for students with prior knowledge corresponding to module 1.

Module 1 – Introduction to Basic RL and Control

  • RL foundations
  • Dynamic Programming
  • Monte Carlo Methods
  • Tabular temporal-difference learning
  • Planning with a Model and Learning
  • Public Perception of RL and RL in Media
  • Control and Reinforcement Learning Basics
  • Basic RL with function approximation
  • Basic policy gradient methods
  • Lab and exercises

Module 2 – Deep RL and control-based methods part 1

  • Deep temporal-difference learning in discrete actions
  • Deep temporal-difference learning with continuous actions
  • Temporal-difference learning for Linear Quadratic (LQ) problem)
  • Deep policy gradient methods
  • Maximum entropy RL
  • Lab and exercises

Module 3 – Deep RL and control-based methods part 2

  • Deep actor-critic methods
  • Model-based policy search
  • Monte Carlo tree search
  • RL with constraints
  • Critical reflection about RL research
  • Lab and exercises

Module 4 – Advanced topics in RL

  • Selected advanced methods, e.g., multiple objectives, hierarchical RL, multiple agents, uncertainty, transfer.
  • Outlook on RL research
  • Lab and exercises

The course includes four 2-day on-campus meetings which are aligned to the four modules. Students attend the three meetings aligned with their selected module combination.

A list of references recommended for reading is provided by the teachers.

Reflective learning journal hand-in: The students hand in their reflective learning journal and it is graded according to a grading rubric at the end of the course.

Presentation: The students work in groups, present reading material or results and get graded according to a grading rubric.

Lab: The students do practical work in computer-based sessions and show or present results and get graded according to a grading rubric. Depending in the number of students this might be done in form of a report.

The course allows one single retry for the assessment tasks at a date 6 months after the course concluded. The assessment tasks might be altered for the retry.

If you are not a student at KTH you must login via https://canvas.kth.se/login/canvas