Project R-13211


Scalable probabilistic database Markov chains for simulation of infectious disease transmission (Research)


Over the past decade, probabilistic databases (PDBs) have been proposed as a means of integrating relational database query answering and probabilistic reasoning. In this research project, I observe that Markov chains defined over probabilistic databases form a declarative means for simulating the transmission of infectious diseases. The efficient processing of such chains hence opens the door to a flexible declarative specification and efficient simulation of infectious disease transmission models. Unfortunately, PDB query processing is known to be intractable in general, and query processing in PDB Markov chains is harder still. Motivated by these observations, my objective in this research project is to study practical methods for the scalable processing of PDB Markov chains in the general setting, with the simulation of infectious disease transmission as a concrete application use case. My research hypothesis is that such methods can be obtained by exploiting, in a combined and principled manner, structural properties of both queries and data. My study is divided into two axes. In the first axis, a Monte-Carlo approach towards approximate query answering is studied that aims to exploit the interaction between generation of new random variables, database statistics, and random sampling. In the second axis, exact and approximate relational query answering approaches are studied that aim to exploit treewidth of both query and data.

Period of project

01 November 2022 - 31 October 2026