Reinforcement Learning (6hp)

Reinforcement Learning (RL) is one of the main branches of Machine Learning allowing a system to learn through a trial-and-error process. The emerging ﬁeld of RL has led to impressive results in varied domains like strategy games, robotics, etc. This course aims to give a deep understanding of the wide area of RL, build the basic theoretical foundation, and summarize state-of-the-art RL algorithms. The course covers topics from sequential decision making, probability theory, optimization, and control theory. At the end of this course, students will be able to formalize a task as an RL problem, have practical skills to implement recent advances in RL, and are ready to contribute to this ﬁeld.

Course information

Course type:

AS track: elective
AI track*: prioritized (elective)
Joint curriculum: advanced

Time: Given even years, Autumn

Teachers: Farnaz Adib Yaghmaie (LiU), Fredrik Heintz (LiU), Johannes Andreas Stork (ORU)

Examiner: Johannes Andreas Stork (ORU)

*Those in the AI-track, who have not taken the mandatory course “Learning Theory and Reinforcement Learning” must select at least one of the courses “Learning Theory” or “Reinforcement learning”. Those in the AI-track, who have taken the course “Learning Theory and Reinforcement Learning” may also take “Reinforcement learning” as one of their elective courses.

Entry requirements

The participants are assumed to have a background in mathematics corresponding to the contents of the WASP-course “Mathematics and Machine Learning”. Other entry requirements for the course:

Probability theory (estimation, Monte Carlo, etc.)
Optimization (mean-squares error, categorical cross entropy loss, dynamic programming)
Deep learning basics (back-propagation, fully connected, convolutional layers)
Programming in Python (Numpy, plotting, deep learning with Python, etc.)
Basic knowledge about RL (MDP, tabular RL, value iteration, policy search, Q-learning, SARSA, etc.) corresponding to module 1 of this course, in case module combination 2+3+4 is selected instead of module combination 1+2+3.

Intended learning outcomes

Knowledge and Understanding

After completed studies, the student shall be able to

explain and characterize the concept of RL and categorize RL agents,
describe, explain, compare, and characterize diﬀerent types of basic and advanced reinforcement learning methods,
derive from ﬁrst principles and explain what the underlying mathematical principles of these reinforcement learning methods are, and
restate a control problem as an RL problem.

Competences and Skills

After completed studies, the student shall be able to

analyze and compare results of reinforcement learning methods,
implement (relevant parts of) advanced reinforcement learning algorithms,
apply advanced reinforcement learning algorithms,
read and critically review scientiﬁc publications about reinforcement learning,
use established software, frameworks, and libraries to implement (relevant parts of) RL algorithms and environments.

Judgment and Approach

After completed studies, the student shall be able to

discuss and reﬂect on important and advanced concepts in reinforcement learning,
discuss and reﬂect on what inﬂuences the performance of these methods,
discuss and reﬂect on when which of these methods applies to a given scenario or problem,
discuss and reﬂect on scientiﬁc publications about reinforcement learning,
propose extensions and modiﬁcations to improve the performance of an RL algorithm for a speciﬁc problem.

Course content

The course is organized into 4 sequential modules that build on each other. Students take 3 of these modules in sequence, i.e. either module combination 1+2+3 or module combination 2+3+4. Combination 1+2+3 is for students without prior knowledge of RL while combination 2+3+4 is for students with prior knowledge corresponding to module 1.

Module 1 – Introduction to Basic RL and Control

RL foundations
Dynamic Programming
Monte Carlo Methods
Tabular temporal-difference learning
Planning with a Model and Learning
Public Perception of RL and RL in Media
Control and Reinforcement Learning Basics
Basic RL with function approximation
Basic policy gradient methods
Lab and exercises

Module 2 – Deep RL and control-based methods part 1

Deep temporal-difference learning in discrete actions
Deep temporal-difference learning with continuous actions
Temporal-difference learning for Linear Quadratic (LQ) problem)
Deep policy gradient methods
Maximum entropy RL
Lab and exercises

Module 3 – Deep RL and control-based methods part 2

Deep actor-critic methods
Model-based policy search
Monte Carlo tree search
RL with constraints
Critical reflection about RL research
Lab and exercises

Module 4 – Advanced topics in RL

Selected advanced methods, e.g., multiple objectives, hierarchical RL, multiple agents, uncertainty, transfer.
Outlook on RL research
Lab and exercises

The course includes four 2-day on-campus meetings which are aligned to the four modules. Students attend the three meetings aligned with their selected module combination.

Literature

A list of references recommended for reading is provided by the teachers.

Examination

Reflective learning journal hand-in: The students hand in their reﬂective learning journal and it is graded according to a grading rubric at the end of the course.

Presentation: The students work in groups, present reading material or results and get graded according to a grading rubric.

Lab: The students do practical work in computer-based sessions and show or present results and get graded according to a grading rubric. Depending in the number of students this might be done in form of a report.

The course allows one single retry for the assessment tasks at a date 6 months after the course concluded. The assessment tasks might be altered for the retry.

Syllabus (Kursplan)

Course Syllabus- Reinforcement Learning

Course page

Course page 2024 (CANVAS KTH)

Course page 2022 (CANVAS KTH)

If you are not a student at KTH you must login via https://canvas.kth.se/login/canvas

Course reports

Course Report- Reinforcement Learning 2022