Testimony of a PhD in Statistics at Arinti
Ali Charkhi, Arinti
I will start my presentation with Arinti’s goals, products and services and regular projects that we do daily. I will also explain what I am doing as a Data Scientist in Arinti. The main part of this presentation is about food authenticity. I will briefly explain how we collect data with a spectroscopy device. Then, I will present how we can use predictive models to classify different fish species based on the spectroscopy data. Later, I will introduce one challenge in our project which happened due to changes in the data distribution (change in spectroscopy device). Due to this change in data distribution, the predictive model cannot be generalized over devices. Finally a heuristic solution for this problem will be provided. More precisely I will explain how we can solve this issue by using data augmentation.
Data Science and Statistics: the shooting S,
Christophe Croux, Edhec Business School, France
Being trained as a mathematical statistician, I will first share my thoughts and experience on (i) how data science offers opportunities and threats to statisticians (ii) how a master in data science master at a Business School differs from an academic training in statistics (iii) whether a PhD or an academic profession in data science is worthwhile. In the second part of my talk I will present the shooting S estimator. This is a regression estimator that can cope with both outlying observation and a large numbers of variables. It is an example of a research topic that fits both within the statistics and the data science literature.
Improving Your Statistical Questions
Daniël Lakens, TUEindhoven, The Netherlands
When proposing alternatives to p-values statisticians in the scientific literature often commit the ‘Statistician’s Fallacy’, where they declare which statistic people really ‘want to know’. Instead of telling others what they want to know, statisticians should teach people which questions they can ask. All statistics have assumptions and practical limitations. I will discuss the ways p-values have been criticized as an illustration of the rather unproductive approach of dismissing one approach to statistical inferences, instead of improving the way it is used in practice. As long as null-hypothesis tests have been criticized, researchers have suggested to include minimal-effects tests and equivalence tests in our statistical toolbox. Although these types of tests have the potential to greatly improve the questions researchers ask, they are rarely taught to people who are expected to use statistics in the future. By more formally explaining what questions are answered by different approaches to statistical questions, we can help people to improve the statistical questions they ask.
Living with uncertainty: how the questions we ask shape the answers we give
Joris Meys, UGent
Uncertainty lies at the core of statistics. Regardless the framework or method, statisticians formulate strict hypotheses that can be translated into some measure of probability. Yet, both decision makers and the general public struggle to interprete these raw outcomes in a meaningful way. Even scientists don't always know how to translate a broader research question into a meaningful hypothesis, or translate their statistical results into a more general message. Using the current discussion on climate as a common thread, we explore how the questions we're asked shape the answers we can provide. This presentation is largely based on my personal expertise as science communicator, and the broader framework Judea Pearl explores in his book "The book of Why".
Marijke Van Moerbeke, OpenAnalytics
R is an open source programming language and environment for statistical computation, data science and visualization. The open source mindset is one of the core advantage of the language as R is easily accessible and extensible by any user via the submission of R packages. New methodologies (or for new data platform) are often available in R first. During the recent years, the R community has expanded significantly with now more than 2 million users and frequently organizes conferences and trainings dedicated to the use of R. This encourages collaborations, sharing experiences and open practices for further development and improvement of the language. In addition to the academic world, a broad range of industries including biotech, finance, research and high technology industries have integrated the R language into third party data analysis, visualization and reporting applications. During my talk I will talk about my experience with R: from the first script using the famous iris data set to my consultancy job at Open Analytics using R on a daily basis for numerous applications. I’ll highlight a few cornerstones of R as regular scripting, packaging function and recent developments as reporting with rmarkdown and applications with the Shiny packages.