Probabilistic graphical models play a central role in modern statistics, machine learning, and artificial intelligence. Within these contexts, researchers and practitioners alike are often interested in modeling the (conditional) independencies among variables within some complex system. Graphical models address this problem combinatorially: A probabilistic graphical model associates jointly distributed random variables in the system with the nodes of a graph and then encodes conditional independence relations entailed by the joint distribution in the edge structure of the graph. Consequently, the combinatorial properties of the graph play a role in our understanding of the model and the associated inference problems. We will see in this course how the graph structure provides the scientist with (1) a transparent and tractable representation of relational information encoded by the model, (2) effective approaches to exact and approximate probabilistic inference, and (3) data-driven techniques for learning graphical representatives of relational information, such as conditional independence structure or cause-effect relationships.

Course type:

  • AS track: elective
  • AI track: mandatory
  • Joint curriculum: advanced

Time: Given odd years, Autumn

Teachers: Svante Linusson, Liam Solus (KTH), Fredrik Lindsten, Johan Alenlöv (LiU), Vera Koponen (UU)

Examiner: Benny Avelin, UU

The participants are assumed to have a background in mathematics corresponding to the contents of the WASP-courses “Mathematics for Machine Learning” and “Introduction to logic for AI”.

The course will require a basic understanding of probability theory and graph theory. It is recommended to read and understand the material in Sections 2.1 and 2.2 of Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.

Module 2 requires familiarity with Bayesian statistics (as well as graphical models and basic message-passing algorithms, covered in module 1). For an introduction to Bayesian statistics, we recommend reading

  • Chapters 1.1-1.2 of Chris Bishop’s book Pattern Recognition and Machine Learning, Springer 2006 [PRML].
  • Chapter 2 of PRML introduces several probability distributions that will be used in the module, as well as the notion of conjugate priors.

Module 3 requires basic knowledge of first-order logic. The necessary knowledge and skills can be learned from for example the book “Logic as a tool: A guide to formal logical reasoning” by Valentin Goranko, John Wiley & Sons (2016). Chapters 1.1 – 1.3, 2.5.1 and 3.1 – 3.4 of this book contain essential prerequisites for Module 3. The other chapters are not necessary for the module but contains material relevant for AI in general.

After completing the course, students should be able to
  • Describe how undirected and directed acyclic graphs encode (probabilistic)independence information.
  • Describe and use methods for learning directed acyclic graph representa-tions of conditional independence relations satisfied by a data-generatingdistribution.1
  • Describe the Bayesian inference problem for graphical models, and explainwhen it can be solved exactly and when approximate inference algorithmsare needed.
  • Describe and use methods for exact and approximate probabilistic infer-ence using graph representations of a data-generating distribution.
  • Describe the key differences between deterministic and stochastic (i.e.,Monte–Carlo-based) approximate inference algorithms.
  • Use basic methods of Inductive Logic Programming to learn a logic pro-gram and use if for inference.
  • Use the concept of “probabilities on domains” to learn a parametrizedGraphical Model from a relational structure.
  • Use the concepts of “probabilities on possible worlds” and parametrizedgraphical model to make inferences on arbitrary domains.

In module 1 we will introduce undirected graphical and directed acyclic graphical models as parametric statistical models whose parameterizations are defined according to a given undirected or directed acyclic graph. We then see how such models are equivalently defined as those distributions which satisfy sets of conditional independence relations encoded in the edge structure of their associated graph. We will apply these two different interpretations of our model to address points (2) and (3) from the abstract: The parametric interpretation of graphical models will provide effective algorithms for exact probabilistic inference with computational complexity bounds related to the graph structure, and the conditional independence interpretation will provide some first algorithms for learning representations of cause-effect systems from data. Topics to be discussed include, undirected and directed acyclic graphical models, pairwise, local and global Markov properties for graphs, the Hammersley-Clifford Theorem, the variable elimination and clique-tree algorithms for exact probabilistic inference, Markov equivalence, and basic algorithms for causal discovery.

In Module 2 we consider Bayesian inference in graphical models where the involved distributions and/or the structure of the graph prevents exact solutions. We introduce several approximate inference algorithms to tackle this issue, specifically: expectation propagation, variational inference, and Markovchain Monte Carlo. We highlight some key differences, but also similarities, between these types of algorithms.

Module 3 introduces the basic ideas in Inductive Logic Programming. It explains the two main approaches to unifying logic and probability which are used in Statistical Relational Learning, the “probabilities on domains approach” and the “probabilities on possible worlds approach”. It describes how the first approach can be used for learning a graphical model from a set of data consisting of objects and relations between them, and how the second approach, together with a parametrized graphical model, can be used for making inferences on arbitrary finite domains. The issue of dealing with large domains without exploding computational costs is discussed.

  • Diestel, Reinhard. Graph Theory. Volume 173 of Graduate texts in mathematics (2012):7.
  • Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.
  • Lauritzen, Steffen. Graphical models. Vol. 17. Clarendon Press, 1996.
  • Koponen Vera, Notes on logic, probability and statistical relational learning, distributed by the author.
  • (optional) De Raedt Luc, Kersting Kristian, Natarajan Sriraam, Poole David, Statistical Relational Artificial Intelligence: Logic, Probability, and Computation, Synthesis Lectures on Artificial Intelligence and Machine Learning #32, Morgan & Claypool Publishers (2016).
  • (optional) Getoor Lise, Taskar Ben (editors), Introduction to Statistical Relational Learning, The MIT Press (2007).
  • (optional) De Raedt Luc, Logical and Relational Learning, Springer-Verlag (2008).

Participants in the course will be assessed on all the modules using hand in assignments. For each module there will be a set of regular hand in assignments that are common to all. In addition to this, each participant is required to do an additional hand in assignment going deeper into the concepts covered by one of the three modules. This specialization is chosen at the start of the course.