The Data Science specialization of the Master of Statistics and Data Science provides a comprehensive education, covering the whole data science cycle from data gathering, cleaning and management, to analysis and visualisation, and finally dissemination.
Hasselt University provides a comprehensive Data Science track within the Master of Statistics and Data Science.
More information about the full program is available at the Master of Statistics and Data Science website.
With the advent of the big data era, several global challenges that were outside of reach can now start to be addressed. In the field of medicine, wearable devices and real-time sensors generate huge amounts of data that can shed light on triggers for disease episodes. Omics and genome sequencing can aid in managing and preventing diseases, especially if they are combined with other data sources as, for example, information from social networks. Integrated analysis of weather data, credit card transactions and air pollution data sheds light on how people change their behaviour due to air pollution. Graph analysis of social network data makes it possible to identify fake accounts and fake news - a growing problem in the current political climate. The list goes on... A data scientist is someone who, apart from technical skills to tackle these issues, has a desire to dig deeper and go beneath the surface of a problem.
The Data Science specialization of the Master of Statistics and Data Science provides a comprehensive education in this field, covering the whole data science cycle from data gathering, cleaning and management, to analysis and visualisation, and finally dissemination. Apart from a very decent knowledge of statistical principles. the topics in the master therefore include (but are not limited to) data and software carpentry, programming in python and R, statistics, algorithms, machine learning (including deep learning), natural language processing and data visualisation. In addition to regular courses, students can integrate their knowledge and skills in several data science projects and a hack week.
Statisticians/data scientists needs to be able to communicate with researchers of various fields, report results, and give effective presentations. Developing such skills is an integral part of the program.
Topics that are covered include (but are not limited to):
Data science is booming and data scientists and statisticians are in great demand. Both companies (in fields ranging from pharmaceuticals to manufacturing and banking) and the public sector are struggling to find good candidates with a solid background in data science. This shortage is a clear constraint in these sectors. Job descriptions range from covering the full data science cycle in a research setting, to specific analyses to maximise the environmental, societal or financial impact in a company. In addition, data scientists are often employed to streamline data wrangling and analysis pipelines. With the growing availability of public datasets, an exciting new opportunity also lies in data journalism where you really cover everything from defining a question, searching for data, cleaning and analysing, and dissemination.
Communicable diseases pose a continuous threat to human welfare and development. This threat comes in two forms. Well-known, often treatable infections are still the leading cause of death in resource-constrained countries and put an economic burden on both developed as developing countries. Besides, newly emerging pathogens with pandemic potential may lead to world-wide crises, as 2019-nCoV is currently showing. In recent years network epidemiology has been able to provide several insights into the diffusion of communicable diseases. Lying at the intersection of statistics, network theory, and data science, network epidemiology has been exploiting the large amount of data available in the information era to relax models’ assumptions and generate quantitative predictions.
In this seminar, I will give an overview of the field, introduce specific models of network epidemiology and discuss illustrative case studies.
Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven
Contact: thomas.neyens@uhasselt.be
Clustered non-Gaussian data are most frequently analyzed using the generalized linear mixed model (GLMM). Commonly, the GLMM is fitted by maximizing the marginal (log-)likelihood, i.e., integrating out the random effects. However, this whole maximization may require a considerable amount of computing resources. Although computationally manageable with medium to large data, it can be too time-consuming or computationally intractable with several large-size clusters. To overcome this, a fast two-stage estimator for correlated non-Gaussian data is presented. It is rooted in the pseudo-likelihood split-sample methodology. Based on simulations, it shows good statistical properties, and it is computationally much faster than the full maximum likelihood estimator. The approach is illustrated using a large dataset belonging to a network of Belgian general practices.
Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven
Contact: thomas.neyens@uhasselt.be
Suppose we observe a series of binary data along with explanatory variables and we suspect that these observations belong to a parametric regression model. To verify this assumption, we use Kolmogorov-Smirnov and Cramér von Mises type tests based on a maximum likelihood estimate of the parameter and a marked empirical process introduced by Stute. We determine the critical values for the tests with a special bootstrap procedure in which the resampling scheme is adapted to the parametric setup.
This approach is applied to simulated and real data. In the latter case, we review some parametric model assumptions in the context of right censored data. The method presented is discussed in the light of machine learning (statistical learning) and how it can be applied to generalized linear models, distinguishing between semi-parametric and parametric GLMs.
Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven
Contact: thomas.neyens@uhasselt.be
Not all data scientists have a background in 'software engineering'. However, as you develop your career, you are expected to write code of a certain level of quality. Luckily, a number of very simple things go a long way towards improving your code substantially. For good programmers, they are second nature, and you should strive to make them a habit. Many of these good coding practices are applicable to any programming language, but we tailor this session specifically to R.
Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven
Contact: thomas.neyens@uhasselt.be
Agoralaan Gebouw D, 3590 Diepenbeek, Belgium
Office E118
Agoralaan Gebouw D - B -3590 Diepenbeek
Office E132