Flanders AI Research program

Newsletterohdsi 2024 01 24T153824.689 Newsletterohdsi 2024 01 24T153824.689
Map

(2019-31.12.2023)

Artificial intelligence (AI) is making impressive progress in all sectors, including healthcare. Indeed, by means of AI technologies, we can improve the healthcare system and gain insights into diseases we would not be able to gain without it. In 2019, the Department of Economy, Science and Innovation provided impulse funding for cutting-edge AI research: the Flanders AI Research programme.

The research program is organized around four grand challenges and within this program, Liesbet M. Peeters acts as the use case lead of the Use Case multiple sclerosis (MS), which is part of Grand Challenge 1 (AI-Driven Data Science: Making Data Science Hybrid, Automated, Trusted and Actionable), Work Package 7 (use cases in Health). We aim to speed-up the identification of the right treatment for the right patient at the right time by improving the data management of data that is already collected and applying AI techniques on these datasets.

To do so, we collaborate with the other Flemish universities: University of Leuven (KU Leuven, including campus Kortrijk (KULAK)), Ghent (UGent) and Antwerp (UAntwerp), more specifically following research groups are involved:

  • KU Leuven:
    • Declarative Languages and Artificial Intelligence section of Computer Science (DTAI)
    • ESAT-STADIUS center for dynamical systems, signal processing and data analytics
    • Interactive Technologies (ITEC) - KULAK
  • UGent:
    • Department of Information technology (IDLab)
    • KERMIT (Knowledge-based Systems)
  • UAntwerp:
    • Vision Lab

PI’s involved in our collaborations: Alexander Bertrand (KUL), Luc De Raedt (KUL), Maarten De Vos (KUL), Yves Moreau (KUL), Johan Suykens (KUL), Celine Vens (KULAK), Tom Dhaene (UGent), Willem Waegeman (UGent), Jan Sijbers (UAntwerpen).

Newsletterohdsi 2024 01 24T152827.708

Within the Use Case MS, we aim to meet three clinical challenges:

  1. Which treatment is the best?
    Therefore, we aim to develop decision-support systems for relative treatment effectiveness in a real-world setting (using real-world data).
  2. Does the treatment work?
    Therefore, we aim to identify new biomarkers for disease activities, more specifically, to identify new features in MRI and evoked potential time series to support the identification of treatment (non) responders.
  3. How can we improve performance to reach the level of the individual patient?
    Therefore, we aim to develop tools and methodologies to support scaling-up real-world MS data research.

To meet the three clinical challenges, we work on 4 different Proof of Concepts (POC):

  • POC1 (= federated architecture that was used in the COVID-19 in MS Global Data Sharing initiative, a project of the MS Data Alliance)(--> clinical challenge 3):
    • Goal: To speed-up collaborative research projects of real-world MS data research
    • Methods: federated learning analytics and automated data quality assessment and enhancement pipelines
    • PhD students of our research group involved in POC1: Tina Parciak, Ashkan Pirmani
  • POC2 (--> clinical challenge 1):
    • Goal: To develop a decision support system to predict progression of MS
    • Methods: AI algorithms that cope better with missingness of trajectory data (longitudinal data) and increase interpretability of results
  • POC3 (--> clinical challenge 2):
    • Goal: Identify new biomarkers for disease progression
    • Methods: using AI algorithm that cope better with and/or improve data quality of small high-dimensional trajectory real-world data with a specific focus on MRI and time series data
    • PhD student of our research group involved in POC3: Hamza Khan
  • POC4 (--> clinical challenge 3):
    • Goal:  To speed-up the set-up of data infrastructures handling real-world data
    • Methods: AI tools that speed-up data wrangling, data integration, data visualisation and data sharing through federated learning technologies
    • PhD students of our research group involved in POC4: Marcel Parciak

Projects within the Flanders AI Research program (use case MS) we work on within our team:

1.2.1 Federated infrastructure used in the COVID-19 in MS Global Data Sharing Initiative (GDSI) (= MSDA project) (POC1)

Summary:

In 2020, we used dockerized Jupyter notebooks to run the federated pipeline of the GDSI (MSDA infrastructure v1.0). However, since then we have been working on improving the MSDA federated architecture (MSDA infrastructure v2.1) to incorporate some of the feedback, comments and suggestions we have received. The main differences between the MSDA infrastructure v1.0 and v2.1 are the following:

  • We have reduced the “black-box” experience. More specifically, the MSDA infrastructure v1.0 resulted in a bunch of csv files that were very difficult to assess. In the MSDA infrastructure v2.1, we have worked on developing a user-friendly end-user interface (UI) that will allow you to walk through the pipeline more intuitively as well as to assess the results of the queries using a dashboard visualization of the key results.
  • We have eliminated the need to exchange scripts using email, to reduce issues with “version control”. Last year, scripts were sent back and forth continuously, leading to the use of not-up-to-date scripts. Now, the script is shared via a GitHub repository, making it possible to ensure that you always have the latest version of the script.
  • Increased security: using a new operating system (Alpine) and additional hardening
  • Resource efficiency: reducing the size of the docker container with 91% (from 1.31 to 0.12 GB)
  • Coding language agnostic: different coding languages are now possible (compared to only Python before)

More details about the MSDA infrastructure v1.0 and v2.1 are provided in this link.
Next to this, this video provides you with a demo of the MSDA infrastructure v2.1.

A paper about the federated pipeline used in the GDSI was submitted to a scientific journal and is currently under review.

1.2.2 Leveraging Federated Learning for MS: an empirical examination of using RWD (POC1)

Summary:

Our primary objective is to evaluate the potential of implementing Federated Learning in the intricate multiple sclerosis ecosystem and identify the most effective federated methods. This endeavor holds immense significance as it has the potential to pave the way for groundbreaking advancements and foster unprecedented innovation in this field. This project aims to comprehensively assess the impact of federated learning on predicting disability progression in people with MS and compare its effectiveness to the well-established MS Benchmark. The first step involves partitioning the centralized MSBase data into virtual clients and constructing various scenarios to evaluate the effect of different designs on training performance. Subsequently, different federated strategies will be employed to determine their effectiveness using diverse approaches. Three key metrics, including predictive performance, time to convergence, and computation cost, will be assessed and compared to the centralized scenario, ensuring a scientifically rigorous analysis.

Deliverables:

  • Provide valuable insights into best practices for ensuring both model accuracy and privacy preservation.
  • Trade-offs between centralization and decentralization by examining various levels of data aggregation and partitioning, ultimately determining the most effective level for federated learning  implementation.
  • Deeper understanding of the potential benefits of federated learning in medical research.
  • Pave the way for more effective and efficient collaborative research efforts, fostering innovation and progress in the medical domain

Partners involved: UHasselt, Noorderhart, KU Leuven

1.2.3 Supporting treatment decisions for multiple sclerosis in daily practice with high-performance systems (POC2)

Summary:

The study of longitudinal patient data has received much attention. The increased availability of such data reinforced this trend, allowing data-hungry machine learning techniques to be applied with a focus on going further towards patient specific precision medicine. However, temporal patient data present a number of challenges that are not easily addressed. The main difficulties are missing values (some measurements are not available for every patient), the non-constant sampling (time between hospital registrations can be very sporadic) and the joint nature of the data (events and longitudinal data are both present). We propose to address these issues with a new modeling technique relying on Bayesian dictionary learning of latent processes. This methodology posits that the observations made about each patient are generated by an underlying hidden disease process, accounting for the trajectories variability among patients.

Deliverables:

  • Development of new algorithms based on "Bayesian Dictionary Learning of Latent Processes", thereby greatly improving forecast accuracy.
  • Designing an innovative infrastructure that enables data analysis on different federated datasets.

Partners involved: UHasselt, Noorderhart, KU Leuven

1.2.4 Automated Machine Learning for traject prediction in people with MS (POC2)

Summary:

Delays in identifying treatment failure, and in selecting a more optimal DMT from the next line, reduce the quality of life of people with MS.

Doctors need help to find the right treatment and to detect treatment failure (="non-responders") more quickly. We believe there is an urgent need for accurate and high-performance algorithms that can support decisions around progression and treatment of the disease. Moreover, these algorithms should be developed using "observational datasets". Observational datasets bring together data from a heterogeneous population (in contrast to clinical trials).

However, although there is great potential in using observational datasets, this is also accompanied by several challenges because of the many imperfections inherent in this type of data. Therefore, there is a need for new algorithms that can better deal with these imperfections. In the context of another project ("Supporting treatment decisions for multiple sclerosis in daily practice with high-performance systems”), we achieved very promising results. Indeed, we managed to greatly increase the accuracy of predictions of progression by introducing machine learning methods. Nevertheless, there is still room for improvement in this work. In this project, we therefore focus on the following 2 problems: 1° machine learning methods today can only be developed and implemented by technical experts and 2° when the datasets are not very large, the accuracy is limited.

Deliverables:

  • Develop automated models that can also be applied to other problems.
  • Increase accuracy of models when using small datasets by incorporating expert knowledge

Partners involved: UHasselt, Noorderhart, UGent

1.2.5 Quality of evoked potentials in MS (POC2)

Summary:

In this project, we aim to support research into computer-assisted methods for patient follow-up and evaluation of the effectiveness of therapies by looking at a mainstay of computer-assisted methods: data. To obtain reliable results, these methods need to be provided with quality data. Our focus will be on data resulting from a commonly used clinical test in MS patients: evoked potential (EP) data. Existing methods around quality assessment and improvement for this data modality are very limited and our aim is to improve on this. Our hypothesis is that quality assessment and improvement of EP data leads to a higher level of reliability in current computer-assisted methods. First, we investigate methods for automatic quality assessment and evaluation and, in a second step, we look at methods for quality improvement in EP data identified in the first step as not being of sufficient quality for further processing.

Deliverables:

  • Obtain an algorithm to assess the quality and reliability of new EP data. This assessment can confirm that the data is suitable for further processing, indicate whether there is a need for quality improvement before proceeding to further processing, or warn that the data is not suitable for further processing and other clinical parameters should be considered.
  • Obtain an optimal technique to solve the anomaly.

Partners involved: UHasselt, Noorderhart, KU Leuven

1.2.6 Automatic feature extraction and interpretable representation learning using evoked potential time series in multiple sclerosis (POC2)

Summary:

Whether or not a treatment is the right treatment for a particular person is very difficult to determine (treatment-failure). One of the important reasons why it is so difficult to estimate the effectiveness of treatment is because today we do not have the right outcome measures to measure treatment effectiveness. Today, the outcome measure "EDSS" (expanded disability severity score) is usually used. However, the EDSS outcome measure is not sensitive and therefore changes are often noticed very late (e.g. only after 2 years or even longer). Previous research by our research group has shown that evoked potential (EP) data can be supportive in predicting progression.

With this research, we aim to further improve the models by developing and implementing new algorithms. In doing so, we are building on an already approved protocol (Modelling and predicting the progression of Multiple Sclerosis using retrospective data: a pilot study). We wish to improve these models using artificial neural networks (ANN), which could find certain complex relationships present in the data and are harder (or impossible) to find with traditional methods. In addition, we could design the models so that they are easily and intuitively explainable. This would provide great added value for the physician in charge of the patient.

Deliverables:

  • Design feature extraction models (e.g. variational autoencoders, GANs, ...) to extract complex features that can then be used as an additional tool for the doctor to make a prognosis.
  • Design a deep learning model with metric learning that gives a new score metric independent of the EDSS score.
  • Introduce explainability in the above-mentioned models so that the doctor does not overlook anything and that the models become more reliable

Partners involved: UHasselt, Noorderhart, UGent

1.2.7 Artificial Intelligence in MS - data wrangling, survival analysis and data imputation (POC2, 3, 4)

Summary:

In this project, we aim to develop new advanced artificial intelligence (AI) techniques and showcase the relevance of these techniques in improving the performance of decision-support systems (DSS) for disease progression for people with multiple sclerosis (PwMS).

We believe large-scale and advanced modeling of real-world data (RWD) is needed to support the development of high-performance decision-support systems for multiple sclerosis.

Due to the large number of patients available in the existing registries, several works have used machine learning for various tasks). First results of the project regarding patient trajectories (Edward De Brouwer et al.) are very promising. Using the MSBase Global Dataset, we showed that with machine learning methods suited for patient trajectories modeling, we are able to predict disability progression of patients in a 2 years horizon with an AUC of over 85%, which represents more than 15% increase compared to baselines methods using static clinical features.

It is clear that the MSBase Global Dataset is the perfect data source to develop and showcase complex AI algorithms and that introducing more complex AI in the MS RWD increases the performance of tasks. Next to this, the MSBase consortium gives us the opportunity to work with the best clinical experts in the field, thereby greatly increasing the changes of this interdisciplinary research project.

in this project, we focus on overcoming following challenges:

  • Challenge 1: RWD is often messy and heterogeneous. We propose to develop AI solutions to support automated error detection and harmonization strategies.
  • Challenge 2: AI models are often perceived as black-box approaches and lack interpretability. We focus on increasing the interpretability of survival models to predict time-to-event endpoints (e.g. time to EDSS 6, time to relapse, …).
  • Challenge 3: The full trajectory information and history information of an individual patient is often featured by missingness on different levels. We aim to develop new methods for data imputation and investigate the relevance of the implementation of such techniques.

Deliverables:

  • Develop AI solutions to support automated error detection and harmonization strategies.
  • Increasing the interpretability of survival models to predict time-to-event endpoints (e.g. time to EDSS 6, time to relapse, …).
  • Develop new methods for data imputation and investigate the relevance of the implementation of such techniques.

Partners involved: UHasselt, Noorderhart, UGent, KU Leuven (including KULAK)

1.2.8 Short-term differences in radiomic and/or evoked potential time series features predict long-term treatment effectiveness (POC 3):

Summary:

We want to develop new and innovative methods using ambitious and advanced methods to evaluate the effectiveness of short- and long-term therapy. In this project, we focus on investigating the relevance of two clinical tests routinely collected at the Noorderhart Rehabilitation and MS Center in Pelt: evoked potentials time series (EPTS) and magnetic resonance imaging (MRI).

Our hypothesis is that short-term differences in radiomics and/or evoked potentials time series features can predict long-term treatment efficacy.

First, we create two pseudo-anonymised MRI datasets linked to clinical data using retrospective data from two different hospitals: Rehabilitation and MS Centre Overpelt (RMSC) and Zuyderland Medical Centre Sittard (ZMCS). Retrospective MRI images are selected for patients with extensive follow-up data. The relevance of radiomics and EPTS is investigated cross-sectionally, longitudinally, as well as in a treatment effectiveness study.

Deliverables:

  • Pseudo-anonymised datasets (with retrospective data) suitable to investigate research questions.
  • Knowing the relevance of radiomics in MS (disease progression and treatment process).
  • Have algorithms that predict disease progression and treatment responses based on the entire, multi-dimensional trajectory of the patient (i.e. measurements taken during the entire disease course). These algorithms combine the longitudinal data of EPTS features, radiomics features and disease progression (EDSS).

Partners involved: UHasselt, Noorderhart, Zuyderland Medisch centrum Sittard, UMaastricht, KU Leuven, UGent

1.2.9 Machine learning-based harmonization of MR images from MS patients as a preprocessing step in a pipeline for predicting long-term disability progression (POC3)

Summary:

In the Radiomics/Epomics project, we are currently working with a consortium to identify new biomarkers in MRI and evoked potentials that should improve predictions around progression and effectiveness of treatments. The magnetic resonance (MR) imaging (MRIs) used as part of this project were taken during "standard clinical practice". That is, the MRIs were taken in the context of care and thus using few standardised protocols and other scanners. The features used to perform the task will vary depending on the scanner that was used. Specifically, images of a person obtained using the same scanning protocol will still have different features. The site-specific variation present in the features is undesirable in machine learning algorithms.

In this project, we investigate machine learning-based methods to reduce this unwanted variation (also called harmonisation). The removal of the unwanted site-specific variation by the methods found should lead to more accurate prediction of disease progression.

Deliverables:

  • Finding machine learning-based methods to reduce unwanted variation in MRIs.
  • The removal of unwanted site-specific variation by the methods found should lead to more accurate prediction of disease progression.

Partners involved: UHasselt, Noorderhart, UAntwerpen, UMaastricht, Icometrix

1.2.10 Evoked Potential Time Series in Multiple Sclerosis: Automatically Deserializing Files with Unknown Data Using Machine learning (ADFUNDUM) (POC4):

Summary:

In Noorderhart, EPTS (evoked potential time series) are measured in people with MS using EPTS devices. However, the raw data of those measurements are locked in these devices. KU Leuven's Declarative Languages and Artificial Intelligence (DTAI) research group has written an algorithm that can read raw data stored in a binary format. This software uses pattern recognition and tries to identify both hierarchical structure and primitive data types without supervision. Since variable names are lost due to encoding, this data can be annotated in collaboration with an expert to link certain values back to specific measurements. To evaluate the algorithm, researchers at the DTAI will use a large amount of the raw data files (binary, serialized data dumps). More specifically, retrospective data from Noorderhart and UHasselt (EPOMICS, raw data from EPTS measurements up to 2017).

Deliverable:

Having an algorithm capable of unlocking EPTS data.

Partners involved: UHasselt, Noorderhart, KU Leuven

1.2.11 (Semi)-Automation of Health Data Integration (POC4)

Summary:

The aim of this research project is to give a proof-of-concept of an approach that speeds up the integration of real-world health data. More specifically, we want to showcase the possibility of leveraging (semi-)automated solutions for data integration tasks developed by the data science community to the medical informatics domain. Given that such applications are scarce in the medical informatics community, we will need to explore possible solutions first, and apply those to real-world health data taking the sensitive nature of real-world health data into account.

From the wide range of data integration tasks, we choose schema mapping and data harmonisation to apply (semi-)automated solutions for health data integration. We need to consider two work packages in this research project:

  1. Methodology: In the methodology, we envision and define our approach. This work can be performed on public health datasets that are not necessarily connected to the real-world health datasets we aim to use later.
  2. Bootstrapping: In the bootstrapping step, we apply a solution to real-world health data, having no previous knowledge about the data. We expect multiple iterations between the definition of the methodology and the bootstrapping step.

Deliverables:

  • Usable “weak signals” for (semi-)automated schema mapping for health data are documented
  • Usable “weak signals” for (semi-)automated data harmonisation for health data are documented
  • The application of (semi-)automated methods for health data integration is documented
  • An ETL specification describing how to implement an ETL process to transform real-world data of people with MS to the data collection definition for BELTRIMS is openly available
  • An implemented ETL process, transforming real-world data of PwMS to the data collection definition for BELTRIMS is openly available

Partners involved: UHasselt, Noorderhart