#### 21 Febr 2020

#### Pietro Coletti: Network epidemiology

Communicable diseases pose a continuous threat to human welfare and development. This threat comes in two forms. Well-known, often treatable infections are still the leading cause of death in resource-constrained countries and put an economic burden on both developed as developing countries. Besides, newly emerging pathogens with pandemic potential may lead to world-wide crises, as 2019-nCoV is currently showing. In recent years network epidemiology has been able to provide several insights into the diffusion of communicable diseases. Lying at the intersection of statistics, network theory, and data science, network epidemiology has been exploiting the large amount of data available in the information era to relax models’ assumptions and generate quantitative predictions.

In this seminar, I will give an overview of the field, introduce specific models of network epidemiology and discuss illustrative case studies.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven

Contact: thomas.neyens@uhasselt.be

#### 6 Mar 2020, 12.30-13.30

#### Alvaro Florez Poveda: Fast two-stage estimator for large clustered non-Gaussian data

Clustered non-Gaussian data are most frequently analyzed using the generalized linear mixed model (GLMM). Commonly, the GLMM is fitted by maximizing the marginal (log-)likelihood, i.e., integrating out the random effects. However, this whole maximization may require a considerable amount of computing resources. Although computationally manageable with medium to large data, it can be too time-consuming or computationally intractable with several large-size clusters. To overcome this, a fast two-stage estimator for correlated non-Gaussian data is presented. It is rooted in the pseudo-likelihood split-sample methodology. Based on simulations, it shows good statistical properties, and it is computationally much faster than the full maximum likelihood estimator. The approach is illustrated using a large dataset belonging to a network of Belgian general practices.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven

Contact: thomas.neyens@uhasselt.be

#### 17 Mar 2020, 12.30-13.30

#### Gerhard Dikta: Bootstrap approximations to check parametric regression models

Suppose we observe a series of binary data along with explanatory variables and we suspect that these observations belong to a parametric regression model. To verify this assumption, we use Kolmogorov-Smirnov and Cramér von Mises type tests based on a maximum likelihood estimate of the parameter and a marked empirical process introduced by Stute. We determine the critical values for the tests with a special bootstrap procedure in which the resampling scheme is adapted to the parametric setup.

This approach is applied to simulated and real data. In the latter case, we review some parametric model assumptions in the context of right censored data. The method presented is discussed in the light of machine learning (statistical learning) and how it can be applied to generalized linear models, distinguishing between semi-parametric and parametric GLMs.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven

Contact: thomas.neyens@uhasselt.be

#### 20 Mar 2020, 12.30-14.00

#### Joris Van Houtven and Geert-Jan Bex: Good coding practices in R

Not all data scientists have a background in 'software engineering'. However, as you develop your career, you are expected to write code of a certain level of quality. Luckily, a number of very simple things go a long way towards improving your code substantially. For good programmers, they are second nature, and you should strive to make them a habit. Many of these good coding practices are applicable to any programming language, but we tailor this session specifically to R.

Location: Meeting Room E139, Hasselt University, Campus Diepenbeek, and Meeting Room A, KULeuven

Contact: thomas.neyens@uhasselt.be