The Data Science specialization of the Master of Statistics and Data Science provides a comprehensive education, covering the whole data science cycle from data gathering, cleaning and management, to analysis and visualisation, and finally dissemination.

education education

Master of Statistics and Data Science

DSI supports data science training for various target groups. DSI members contribute to other UHasselt master and bachelor programmes, e.g. Computer Sciences, Mathematics, Medicine, Materiomics, (Biomedical) Sciences etc. but foremost our Master of Statistics and Data Science.

Besides the university programmes, many colleagues are actively involved in educational outreach to secondary schools as well as in science communication activities.

Master of Statistics and Data Science

Hasselt University offers a Master of Statistics and Data science with four specialisations: Biostatistics, Bioinformatics, Quantitative epidemiology and since 2020 a Data Science track.  More information about the full program is available at the Master of Statistics and Data Science website.

A data scientist

With the advent of the big data era, several global challenges that were outside of reach can now start to be addressed. In the field of medicine, wearable devices and real-time sensors generate huge amounts of data that can shed light on triggers for disease episodes. Omics and genome sequencing can aid in managing and preventing diseases, especially if they are combined with other data sources as, for example, information from social networks. Integrated analysis of weather data, credit card transactions and air pollution data sheds light on how people change their behaviour due to air pollution. Graph analysis of social network data makes it possible to identify fake accounts and fake news - a growing problem in the current political climate. The list goes on... A data scientist is someone who, apart from technical skills to tackle these issues, has a desire to dig deeper and go beneath the surface of a problem.

Data science track

The Data Science specialization of the Master of Statistics and Data Science provides a comprehensive education in the field, covering the whole data science cycle from data gathering, cleaning and management, to analysis and visualisation, and finally dissemination. Apart from a very decent knowledge of statistical principles, the topics in the master therefore include (but are not limited to) data and software carpentry, programming in python and R, statistics, algorithms, machine learning (including deep learning), natural language processing and data visualisation. In addition to regular courses, students can integrate their knowledge and skills in several data science projects and a hack week.


Statisticians/data scientists needs to be able to communicate with researchers of various fields, report results, and give effective presentations. Developing such skills is an integral part of the program.

Topics that are covered include (but are not limited to):

  • software carpentry and programming in python and R
  • data carpentry, including data management and processing (SQL, NoSQL, ...)
  • visual analytics
  • statistics
  • machine learning

ICP Program

DSI aims to gradually build and strengthen the South dimension of the ICP Master of Statistics and Data Science programme through cooperation with partners in the South. By putting attention to the South dimension, we are not only elevating the level of statistical education/research in countries in the South, but we are also improving the quality of the master programme, and thus of UHasselt in general. This is done by introducing outstanding students to the programme (in the case of the VLIR-UOS ICP scholarships) and through bilateral collaborations with key partners in the South. We strive for sustainable partnerships, which can extend to research collaborations through VLIR-UOS funding schemes. As described in paragraph d, knowledge transfer is also targeted at local stakeholders as well as international organisations.


A prominent challenge in higher education at both undergraduate and master levels in developing countries is the scarcity of high-quality, R-based resources for educational programs. The >eR-Biostat initiative is dedicated to enhancing education programs in Biostatistics/Statistics, as well as for non-statisticians. Its objective is to create an innovative E-learning system for various courses.

Guest professors

The active interaction between the three main pillars, Research, Education and Consultancy & Valorisation benefit in many ways. One of the successes translates to a number of guest professors, experts in the field, that enrich the program and international collaborations as exchange in courses.

Job opportunities

Data science is booming and data scientists and statisticians are in great demand. Both companies (in fields ranging from pharmaceuticals to manufacturing and banking) and the public sector are struggling to find good candidates with a solid background in data science. This shortage is a clear constraint in these sectors. Job descriptions range from covering the full data science cycle in a research setting, to specific analyses to maximise the environmental, societal or financial impact in a company. In addition, data scientists are often employed to streamline data wrangling and analysis pipelines. With the growing availability of public datasets, an exciting new opportunity also lies in data journalism where you really cover everything from defining a question, searching for data, cleaning and analysing, and dissemination.



10 Oct. 2023, 15.00-17.00 Gitlab & OSL Training DSI

In this 2 hour training, you will learn the basics of software version control and apply these concepts to your scripts and software by using the GitLab version management system. This part of the course is based on an existing training at KU Leuven conducted by Dr. Naeem Muhammed, named “Using GitLab to manage scripts and software”.

Additionally, students learn how to verify, in an automated way, the licenses attached to their scripts. This is important when publishing code on gitlab with commercialization potential in the future. This part of the training is provided by Dr. Emiliano Mancini, who is the open software licensing expert at UHasselt and is currently responsible for this course within the university RDM team.

This training is targeted towards the researchers who develop scripts and software but have little to no git and IT knowledge.


Location: UHasselt Campus Diepenbeek, room TBD

Date: 10/10/23

Time: 15-17h

Entry: free but registration is mandatory via e-mail. Limited number of seats available.

20 Mar 2020, 12.30-14.00 Joris Van Houtven and Geert-Jan Bex: Good coding practices in R

Not all data scientists have a background in 'software engineering'. However, as you develop your career, you are expected to write code of a certain level of quality. Luckily, a number of very simple things go a long way towards improving your code substantially. For good programmers, they are second nature, and you should strive to make them a habit. Many of these good coding practices are applicable to any programming language, but we tailor this session specifically to R.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven


17 Mar 2020, 12.30-13.30 Gerhard Dikta: Bootstrap approximations to check parametric regression models

Suppose we observe a series of binary data along with explanatory variables and we suspect that these observations belong to a parametric regression model. To verify this assumption, we use Kolmogorov-Smirnov and Cramér von Mises type tests based on a maximum likelihood estimate of the parameter and a marked empirical process introduced by Stute. We determine the critical values for the tests with a special bootstrap procedure in which the resampling scheme is adapted to the parametric setup.
This approach is applied to simulated and real data. In the latter case, we review some parametric model assumptions in the context of right censored data. The method presented is discussed in the light of machine learning (statistical learning) and how it can be applied to generalized linear models, distinguishing between semi-parametric and parametric GLMs.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven


6 Mar. 2020, 12.30-13.30 Alvaro Florez Poveda: Fast two-stage estimator for large clustered non-Gaussian data

Clustered non-Gaussian data are most frequently analyzed using the generalized linear mixed model (GLMM). Commonly, the GLMM is fitted by maximizing the marginal (log-)likelihood, i.e., integrating out the random effects. However, this whole maximization may require a considerable amount of computing resources. Although computationally manageable with medium to large data, it can be too time-consuming or computationally intractable with several large-size clusters. To overcome this, a fast two-stage estimator for correlated non-Gaussian data is presented. It is rooted in the pseudo-likelihood split-sample methodology. Based on simulations, it shows good statistical properties, and it is computationally much faster than the full maximum likelihood estimator. The approach is illustrated using a large dataset belonging to a network of Belgian general practices.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven


21 Febr. 2020 Pietro Coletti: Network epidemiology

Communicable diseases pose a continuous threat to human welfare and development. This threat comes in two forms. Well-known, often treatable infections are still the leading cause of death in resource-constrained countries and put an economic burden on both developed as developing countries. Besides, newly emerging pathogens with pandemic potential may lead to world-wide crises, as 2019-nCoV is currently showing. In recent years network epidemiology has been able to provide several insights into the diffusion of communicable diseases. Lying at the intersection of statistics, network theory, and data science, network epidemiology has been exploiting the large amount of data available in the information era to relax models’ assumptions and generate quantitative predictions.

In this seminar, I will give an overview of the field, introduce specific models of network epidemiology and discuss illustrative case studies.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven



Prof. dr. Olivier Thas


Agoralaan Gebouw D, 3590 Diepenbeek, Belgium

Office E118

Voorzitter OMT Ma Statistics and Data Science

dr. Thomas Neyens


Agoralaan Gebouw D - B -3590 Diepenbeek

Office E132

Seminar coördinator