#### Conditional dependencies and testing for covariate effects

Irène Gijbels, Department of Mathematics, KU Leuven, Belgium

Unraveling dependencies in multivariate data remains a challenging task, for which several tools are available. Pinpointing the precise impact of covariates on the dependence structure between two variables is the main interest in this talk. Conditional copulas are a nice tool when the aim is to study conditional dependencies and associations. After a brief introduction to the concepts, and nonparametric and semiparametric estimation of conditional copulas, we focus on testing how a conditional dependence structure may vary with a covariate value. In a most simple setting this conditional dependence structure or conditional association does not alter when the covariate value changes. This is the typical working condition when using pair-copula constructions for modelling multivariate dependencies. For a given set of covariates it is of importance not only to find out which covariates have a significant impact on the dependence structure; but also what is the form of the impact (linear effects, quadratic effects, …). We discuss how to develop testing procedures for checking specific impacts. Some attention will go the advantages and disadvantages of various approaches.

#### This talk is based on joint works with Marek Omelka, Michal Pešta, and Noël Veraverbeke.

#### Adaptive non-parametric estimation of mean and autocovariance in regressin with dependent errors

Tatyana Kribokova, Goettingen University, Germany

We develop a fully automatic non-parametric approach to simultaneous estimation of mean and autocovariance functions in regression with dependent errors. Our empirical Bayesian approach is adaptive, numerically efficient and allows for the construction of confidence sets for the regression function. Consistency of the estimators is shown and small sample performance is demonstrated in simulations and real data analysis.

#### Tail index inference of heavy-tailed data

Tetyana Kadankova, VUB, Belgium

Heavy-tailed distributions play an important role in modelling a wide range of phenomena in finance, geology, hydrology, physics, queuing theory, and telecommunications. Two practical

problems arise in this context:

1) to establish a criterium to check if a given data exhibits heavy-tailed behaviour;

2) to estimate the unknown tail index of the heavy-tailed distribution. To address the first

problem, various graphical methods are usually employed, while the tail index estimators

are typically based on upper order statistics and their asymptotic properties.

In this talk we present an alternative approach for exploratory analysis of tails as well as

for the estimation of the tail index of a heavy-tailed distribution. This approach employs

asymptotical properties of a so called empirical structure function [1] that is a sample moment statistic based on blocks of data. This function is often used in the context of multifractal processes (see [2]), and it allows extracting more information about the tail index. One of the advantages of this method is its validity for weakly dependent samples, which typically come from nancial data. These results can be generalized for a multi-dimension setting, which is a subject of the ongoing research.

References

1. Grahovac D., Leonenko N., Taufer E. Asymptotic properties of the partition function

and applications in tail index inference of heavy-tailed data. Journal of Theoretical

and Applied Statistics 49, (2015).

2. Grahovac D., Leonenko N. Detecting multifractal stochastic processes under heavy-

tailed ects. Chaos, Solitons & fractals 66, 78 {89 (2014).

3. On the tail index of a heavy tailed distribution Yongcheng Qi. Ann. Inst. Stat Math.

62,277 { 298 (2010).

4. Neuman SP, Guadagnini A, Riva M, Siena M. Recent advances in statistical and scaling

analysis of earth and environmental variables. Advances in hydrology, 1{ 25 (2013).

5. Calvet L., Fisher A. Multifractality in asset returns: theory and evidence. Rev.Econ.

Stat. 84(3), 381-406 (2002).

#### On the use of cure models in cancer clinical trials

Catherine Legrand, ISBA-IMMAQ, UCLouvain, Belgium

The most popular model to analyze oncology trials with time-to-event is certainly the Cox proportional hazards model. Over the last decades, advances in medical research have led to the presence of a fraction of cured patients or long-term survivors to become a reality for some specific cancer types and we may wonder whether the Cox model is still appropriate in this setting.

First, standard time-to-event analysis techniques usually assume that all patients under study will experience the event of interest. Second, the presence of short-and long-term survivors may lead to a violation of the proportional hazards assumption. Finally, when a fraction of long-term survivors can be expected, the proportion of «cured» patients becomes a crucial component of the assessment of patient benefit, and being able to distinguish a curative from a life-prolonging effect conveys important additional information in the evaluation of a new treatment.

Specific models have been developed in the statistical literature to address these issues, with two main families of models, namely the “mixture cure models” and the “promotion time cure models”. They are however still rarely used in the analysis of cancer clinical trials. In this presentation, we introduce these two family of models and their links with the usual Cox PH model, and we discuss the use of these models in the context of clinical trials with a time-to-event endpoint and a fraction of cure.

#### Stein's Method in Computational Statistics

Chris Oates, Newcastle University, U.K.

There is a recent trend in computational statistics to move away from sampling methods and towards optimisation methods for posterior approximation. These include discrepancy minimisation, gradient flows and control functionals - all of which have the potential to deliver faster convergence than a Monte Carlo method. In this talk we will provide a basic introduction to some of these algorithms, and then we will attempt to unify these emergent research themes in the context of Stein's method.

#### Modeling networks and network populations via graph distances

Sofia Olhede, University College, London, U.K.

Networks have become a key data analysis tool. They are a simple method of characterising dependence between nodes or actors. Understanding the difference between two networks is also challenging unless they share nodes and are of the same size. We shall discuss how we may compare networks and also consider the regime where more than one network is observed.

We shall also discuss how to parametrize a distribution on labelled graphs in terms of a Frechét mean graph (which depends on a user-specified choice of metric or graph distance) and a parameter that controls the concentration of this distribution about its mean. Entropy is the natural parameter for such control, varying from a point mass concentrated on the Frechét mean itself to a uniform distribution over all graphs on a given vertex set.

Networks present many new statistical challenges. We shall discuss how to resolve these challenges respecting the non-Euclidean nature of network observations.

#### Lavaan and the (computational) history of structural equation modeling

Yves Rosseel, UGhent, Belgium

For several decades, software for structural equation modeling was commercial and/or closed-source. Nine years ago, the lavaan project (http://lavaan.org) was started to create a fully open-source platform for latent variable modeling. In the first part of the presentation, I will discuss how the lavaan project attempts to capture and preserve the long and rich (computational) history of structural equation modeling and related methods. In the tradition of software archeology, several legacy SEM software packages were studied in order to understand and recover the (computational) details that were (and often still are) being used. By implementing many of these details into lavaan, we are able to reproduce results reported in older papers and book chapters, and explain why we observe many subtle (and less subtle) numerical differences in the output of current SEM programs.

However, historical curiosity aside, it turns out that these differences reveal many unsolved problems, and -in some cases- a failure of SEM technology to connect to modern branches of statistics. In the second part of the presentation, I will discuss a (partial) list of topics which need the (renewed) attention of today's SEM researchers, and highlight some opportunities and challenges for the SEM researchers of tomorrow.

#### Statistical evaluation methodology of multivariate surrogate endpoints

Wim Van der Elst, Ariel Alonso, Geert Molenberghs & Alvaro José Flórez

An important factor that affects the duration, complexity, and cost of a clinical trial is the endpoint that is used to study the treatment's efficacy. In some situations, it is infeasible to use the true endpoint (i.e., the most credible indicator of the therapeutic response). For example, the true endpoint may require a long follow-up time (e.g., survival time in oncology), such that the evaluation of the new therapy using this endpoint would be delayed and potentially confounded by other therapies. In a situation like this it can be an attractive strategy to substitute the true endpoint by another endpoint that can be measured earlier (e.g., change in tumor volume in oncology). Such a replacement outcome for the true endpoint is termed a surrogate endpoint.

Before a surrogate endpoint can substitute the true endpoint, its appropriateness has to be demonstrated. In spite of medical and methodological advances, the identification of good surrogate endpoints has remained a challenging endeavor. This may, at least partially, be attributable to the fact that most researchers have only focused on univariate surrogates. Indeed, it has often implicitly been assumed that the treatment effect on the true endpoint can be accurately predicted based on the treatment effect on a single surrogate endpoint. Given the complex nature of many diseases and the various therapeutic pathways in which a treatment can impact the true endpoint, this assumption may be overly optimistic. In this talk it is argued that the use of multivariate surrogate endpoints may allow for a better prediction of the treatment effect on the true endpoint, and several methods to evaluate multivariate surrogates are discussed.

#### The role of empirical Bayes methodology as a leading principle in modern medical statistics

Hans C. van Houwelingen, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands

This paper reviews and discusses the role of Empirical Bayes methodology in medical statistics in the last 50 years. It gives some background on the origin of the empirical Bayes approach and its link with the famous Stein estimator. The paper describes the application in four important areas in medical statistics: disease mapping, health care monitoring, meta-analysis, and multiple testing. It ends with a warning that the application of the outcome of an empirical Bayes analysis to the individual “subjects” is a delicate matter that should be handled with prudence and care.

#### A horse racing between the block maxima method and the peak-over-threshold approach

Chen Zhou, Erasmus University Rotterdam, The Netherlands

Classical extreme value statistics consists of two fundamental approaches: the block maxima (BM) method and the peak-over-threshold (POT) approach. It seems to be general consensus among researchers in the field that the POT method makes use of extreme observations more efficiently than the BM method. We shed light on this discussion from three different perspectives. First, based on recent theoretical results for the BM approach, we provide a theoretical comparison in i.i.d. scenarios. We argue that the data generating process may favour either one or the other approach. Second, if the underlying data possesses serial dependence, we argue that the choice of a method should be primarily guided by the ultimate statistical interest: for instance, POT is preferable for quantile estimation, while BM is preferable for return level estimation. Finally, we discuss the two approaches for multivariate observations and identify various open ends for future research.