Security

Data security is relevant for any form or type of data, protecting data from unauthorized access, avoiding data loss, and ensuring research integrity.

RDM helpdesk Training Calendar RDM

Switch on Security!
Five Safes Framework
Securing personal data
De-identification techniques
Data breach

Switch on Security!

Overview of available software for UHasselt Personnel.
Are you using internet tools and you’re not sure if they are safe? Contact the UHasselt ICT Service Desk

Five Safes Framework

To ensure that your data is safe, it is best to consider the Five safes framework, providing the maximum possible security at 5 levels:

Source: https://www.stats.govt.nz/privacy-impact-assessments/integrated-data-infrastructure-overarching-privacy-impact-assessment/

Click here to view and download a visual overview of the Five Safes Framework.

1 | Safe people

How reliable is the data processor?

Academic staff are contractually obligated to keep the data confidential and to follow standard procedures for secure data management.

Academic staff staff should have relevant training.

External collaborators or students and interns might need to sign additional agreements to properly determine responsibilities. Students and interns must equally follow institutional rules on data management.

2 | Safe projects

How will you ensure that your project is ethical and only collects data that is necessary for the research?

We refer to the ethical and legal page for more information if you need ethical clearance for research, including human subjects, animals, or dual-use, as well as the contractual and legal issues that might apply due to the (re-)use of third-party data, intellectual property rights, or confidentiality.

When handling or processing personal data, consider the following:

Data minimization. This means collecting only the information that is essential to answer the research question of the project.
Storage limitations. This means that as soon as certain data are no longer needed, they should be deleted.
GDPR register. Before starting your research project, register your research project by completing the GDPR checklist of Hasselt University. Depending on the legal basis of processing, request (additional) informed consent for collecting, processing, sharing, or publishing the data.

More information about the Securing personal data is described in the section below.

3 | Safe settings

Do you use the security measures provided by the organisation?

Organizational and technical security measures at Hasselt University:

Research Data Management Policy Plan
Security consultant/ responsible (Data Protection Officer)
Information security policy
Continuity plan
Notification procedure in case of physical/ technical incidents
Raising staff GDPR awareness through information and training incidents
Physical access control to the offices by using keys and badges

When using institutional devices (e.g., laptop) and storage solution (e.g., Google Shared Drive), the following security measures are in place:

Identity and access management	Authentication (e.g. username, password) Authorization system (role based access control) Logging system Physical access control to the datacenter
Data classification and encryption	Report providing that the technical measures' effectiveness is regularly tested/evaluated and assessed Encryption (data at rest) Encryption (data in transit)
Availability control	Regular check of the processing systems and services adequacy Recovery plan in case of physical/ technical incidents Back-ups Anti-virus Regular software updates Password policy

Other security measures you can take to mitigate the risks:

Storing data files or documents on institutional storage, and preservation solutions.
Clean desk policy
Use of secure systems for the transfer of data (e.g. Belnet FileSender)
Use of secure procedures for destroying data

Levels of access control for physical and digital data:

Physical security

Digital security

4 | Safe data

Were the most effective techniques used to minimize the potential for identification in data processing and/or unauthorized access?

* Most relevant for personal and confidential data

In addition to data minimization (e.g., only handling data that is required for the project), several de-identification techniques can be applied to secure the data, such as encryption, anonymization, and pseudonymization. These techniques mask personal identifiers from data, partially or completely, reducing the risk of identifying an individual.

More information about the de-identification techniques is described in the section below.

5 | Safe output

Were the most effective techniques used to prevent the lowest possible chance of identification when sharing data and output openly with the public?

* Most relevant for personal and confidential data

Most funders require you to make your data as open as possible. Making the data open to the public creates visibility for you as a researcher and makes your data findable for reuse (more information on FAIR data vs. open data).

However, there are valuable reasons why you should be cautious or why you should not share your data openly or only after an embargo period:

Legal issues such as intellectual property rights or valorization potential
Personal data can be de-identified to such a level that it is possible to share your data: anonymous data can be shared with open access, and pseudonymized data can be shared with restricted access. Importantly, however, additional measures should be considered, such as informed consent from the human subjects and proper registration in the GDPR register. More information about Securing personal data is described in the section below.

Securing personal data

When you process personal data, you have the ethical and legal obligation to ensure that personal data are sufficiently protected to avoid data leaks or data disclosure.

What are personal data?

Personal data are all information about an identified or identifiable natural person. Identifiable is considered to be a natural person who can be identified directly or indirectly.

Some examples of “normal” personal data include name, address, e-mail address, photo, ID number, IP address, employee number, private or professional telephone number (who’s who), login data, identification cookies, account number, CV, log data (including cafeteria, parking use, web use, surfing), camera images, personnel files, wage data, professional publications, etc.

Data concerning deceased persons or organizations are not personal data according to the GDPR and, therefore, fall outside the scope of the GDPR. Other laws and regulations may, however, apply to these data.

Special categories of personal data (sensitive personal data) are personal data that contain information regarding race, ethnic origin, political views, religious or philosophical beliefs, trade union membership, genetic data, biometric data, health data, data on a person's sexual behaviour or sexual orientation. If this information becomes publicly available, for example as a result of a data breach, this can have very adverse consequences for the data subjects.

Genetic data are personal data related to the inherited or acquired genetic traits of a natural person that provide unique information about the physiology or health of that natural person, and that come in particular from an analysis of a biological sample of that natural person.

Biometric data are personal data that result from specific technical processing with regard to the physical, physiological or behavioural characteristics of a natural person on the basis of which unambiguous identification of that natural person is possible or confirmed, such as facial images or fingerprint data.

Health data are personal data related to the physical or mental health of a natural person, including data on health services provided that supply information about their health status.

Based on GDPR in research

Special category - health data

Health data are a special category of personal data related to the physical or mental health of a natural person, including data on health services provided that supply information about their health status.

Guidelines regarding the processing of medical personal data.

Required use of an Electronic Data Capturing platform in line with Good Clinical Practices.

The platform available at UHasselt is the web-based Castor Electronic Data Capture (EDC) platform

What is an EDC?
Electronic data capture (EDC) software provides an efficient and safe platform in compliance with Good Clinical Practice (GCP) and GDPR, which can be used to build and manage your electronic case report form, online surveys, and databases. EDC software is mandatory for clinical studies but is not restricted to this type of research.

Castor EDC can easily capture clinical data and manage surveys and your electronic case report forms (eCRFs).

An account for Castor EDC can be created without a license; you can set up a project and test the functionalities. A license is needed when you want to change the type of study from Test (to try study structures, etc.) or Example (used as a reference to be shown) to Production (for real participant data).

More information regarding licensing for Castor EDC is available via The Limburg Clinical Research Centre: lcrc@uhasselt.be (co-financing for this is possible from UHasselt.

Never preserve raw data (including the direct identifiers, e.g., name and surname, national insurance number, etc.) on a Hasselt University Google Drive (proprietary MyDrive or Shared).

⇒ Solution:

Always pseudonymize the dataset
Store the key separately with restricted access
Limit access to pseudonymized data by restricted access for authorized staff only
Apply additional password encryption to these datasets

More information about the de-identification techniques is described in the section below.

Risk Assessment

When processing personal data, consider two parameters to determine which de-identification techniques are most appropriate:

Level of sensitivity
Probability of re-identification

The extent to which the parameters are present is determined by:

Type of identifiers: Direct, Indirect, Strong indirect identifiers
Uniqueness/specificity: How unique is your data?
Prevalence of the identifier in the population

With anonymized personal data, the possibilities for identification have been 'irreversibly' removed by means of a processing technique. Anonymised personal data that can be traced back to the original individuals with reasonable effort remain personal data and are not anonymous data and therefore fall under the GDPR. For this reason, it is difficult to completely anonymise many types of research data (for example: qualitative data, large data sets with a wide range of personal data, etc.).

Pseudonymised personal data (referred to as 'coded data' in previous privacy legislation) are personal data (whether sensitive or not) that can only be associated with an identified or identifiable person by means of a non-public (secret) key. Pseudonymised personal data are still personal data protected by the GDPR.

Please note that if you anonymize or pseudonymize personal data yourself, you must work with identifiable personal data at the start and during anonymization, and the GDPR remains applicable.

Processing of personal data should be registered by completing the GDPR checklist.

A GDPR checklist can be linked to a submission to the ethics committee (GDPR checklist for the Comittee for Medical Ethics (CME) / GDPR checklist for the Social and Societal Ethics committee (SMEC) (Dutch / English)) or separate from ethics advice, in which case one should complete the general GDPR checklist.

De-identification techniques for ...

Best practice tip: Create a pseudonymized version of your raw dataset before you start processing and interpreting your dataset, masking personal identifiers from data, partially or completely, reducing the risk of identifying an individual.

Quantitative data

Quantitative personal data refers to numerical data describing a person's characteristics and is typically collected through surveys, experiments, or observational studies.

Recommended techniques and tools to de-identify quantitative data:

Techniques

Basic techniques:

Key table: A subject's or patient's name and other identifiers are changed to meaningless codes or pseudonyms. Hence, do not use another meaningful identifier that is linked to a staff number, patient number, or medical file. Important: Always store the key table in a separate folder from the dataset with restricted access rights.
Generalization/categorization: Categorizing the data points of attributes as broadly as possible such that multiple individuals share the same value. However, keep in mind the dataset's usability and the amount of detail needed to answer the research question. Example: specific brands of cars Volvo, Ford, ... can be grouped into family cars, specific locations can be grouped into regions or countries, and the exact date of birth can be grouped into year or month.
Interval: Grouping values into intervals is another method of generalization. For example: grouping dates of birth into ranges of dates or years, or grouping attributes such as weight, height, and salaries into intervals (e.g., 2500-3500).

Advanced techniques:

Suppression or data masking: an extreme form of de-identification by replacing (parts of) identifiers with pre-defined fixed text (e.g., asterisks) or black tape. Example: a postal code 3500 can be masked into 35**.
Hashing: applying an algorithmic calculation (cryptographic hash function) to change personal data to a unique string of fixed length.
Salted Hashing: Hashing technique using a random value, known as salt, which is added to the attribute being hashed.
Key-hashing (with stored key): special hash function where a secret key is used as an additional input.
Synthetic / simulated data

Automated RDM tools

Open-source software tools for anonymizing personal data:

ARX: download via the UHasselt Software Center or the website. (de-identification techniques: generalization, classification, suppression.)
SdcMicro (R-package): download via the website (de-identification techniques: different advanced options for categorical (recoding, suppression, and randomization) & numerical (generalization, noise, swapping) variables).
dplyr (R-package): dplyr is a grammar of data manipulation, which can be used to suppress, mask, generalize, or aggregate data.
Presidio-anonymizer (python package) available via the website (de-identification techniques: redact, hash, and replace)
NLM scrubber: available via the website (run without installation)

Open-source alternatives not available via software center (yet):

Amnesia (de-identification techniques: generalization, suppression)
OpenPseudonymiser (de-identification technique: salted hash)

Qualitative data

Qualitative research refers to non-numeric data with data expressed in natural language (e.g., textual or visual form). For example comprehensive interviews, focus groups, personal diaries, observations, field notes, responses to open-ended questionnaires, audio and video recordings, and pictures. This methodology provides in-depth datasets containing information about peoples' perspectives, emotions, motivations, beliefs, and expectations.

Qualitative data often contain various types of personal data and a certain context in which the information is provided, making it difficult to anonymize these types of datasets fully.

Best practice tip: pseudonymizing your qualitative dataset after data collection before starting the data analysis and interpretation, especially before sharing data with collaborators or third parties.

Recommended techniques and tools to de-identify qualitative data:

Audio & video files	Use a beep to mask place names and personal names. Use a higher pitch to avoid voice recognition. Blur recognizable faces of persons in images or videos. RDM Tools for security: Audacity \| Open-source software to edit audio files (e.g., change the pitch of voices, add bleeps). Available via the UHasselt Software Center or the Audacity website. GIMP \| Open-source software to edit images (e.g., blur faces). Available via the UHasselt Software Center or the GIMP website. Openshot \| Open-source software to edit videos (e.g., blur faces). Available via the UHasselt Software Center or the OpenShot website.
Transcripts	Replace place names, personal names, job titles etc., with pseudonyms or generic descriptions and indicate replacements with [square brackets], rather than blanking-out that information. Create a pseudonymization key of all replacements, aggregations, or removals. Example of pseudonymizing qualitative data (UK Data Services) More steps you can take to de-identify text (Qualitative Data Repository) RDM Tools for managing transcripts: Nvivo \| Software (License: Get your NVivo license via a request on Servicedesk portal) to transcribe audio-video files and manage unstructured and qualitative textual data. Available via the UHasselt Software Center or the NVIVO website. Other tools for transcription and annotation are documented in Making Qualitative Data Reusable - A Short Guidebook For Researchers
Annotation of text files	Tools for Annotation are listed in Making Qualitative Data Reusable - A Short Guidebook For Researchers NLM scrubber to identify personal information in narrative clinical reports. (download via the website and executable without software center).
Step-by-step anonymizing qualitative data	Find and highlight direct identifiers by reading the transcript. Assess indirect identifiers: Can the identity of a participant be known from information in the data file? Can a third party be disclosed or harmed from information in the data file? Assess the wider picture: Which identifying information about an individual participant can be noted from all the data and documentation available to a user. Remove (or pseudonymise) direct identifiers. Redact or categorize (in)direct indentifiers. Re-assess any remaining disclosure risk. Guide on de-identifying data by Erasmus University Rotterdam

Must read!

Making qualitative data available for reuse? Read the Making Qualitative Data Reusable - A Short Guidebook For Researchers

References:

KULeuven RDM Anonymisation & pseudonymisation
Verburg, Maaike, Braukmann, Ricarda, & Mahabier, Widia. (2023). Making Qualitative Data Reusable - A Short Guidebook For Researchers And Data Stewards Working With Qualitative Data (Version 2). Zenodo. https://doi.org/10.5281/zenodo.8160880

Neuroimaging and biometric data

Biometric data are a special category of personal data that result from specific technical processing with regard to the physical, physiological or behavioural characteristics of a natural person on the basis of which unambiguous identification of that natural person is possible or confirmed, such as facial images or fingerprint data.

Recommended techniques and tools for neuroimaging and biometric data:

Magnetic Resonance Imaging (MRI) data

De-facing is the preferred de-identification technique used on MR images to mask/remove facial features such as eyes, nose, and mouth.

RDM tool for automated de-facing of MRI data:

Pydeface is generally recommended by (f)MRI repositories (e.g., OpenNeuro, OpenfMRI) implemented in Phyton and requires FSL.
However, other automated De-facing tools exist, for a comparison of the different defacing tools see Theyers etc al. (2021). Multisite Comparison of MRI Defacing Software Across Multiple Cohorts. Frontiers in Psychiatry, 12, 617997. https://doi.org/10.3389/fpsyt.2021.617997
Another method to anonymize MR images is Skull stripping, by removing the skull and other non-brain tissues from the MRI scans. However, this is not the preferred method to de-identify scan data and some data repositories (e.g., OpenfMRI) might not accept skull-stripped data. Available in FreeSurfer and as a standalone Synthstrip.

Scrubbing is a technique necessary to remove or generalize personal information or medical details in the header/metadata of image files, DICOM files, and file path names. For example, by specifying the age instead of the actual date of birth, and removing names and addresses.

RDM tools for scrubbing MRI data:

Possible tools for this are DeID, a Java tool with a graphical user interface
MITRE Identification Scrubber Toolkit

Electroencephalography (EEG) and physiological data

Avoid using personal information (initials, birthdate, etc. when collecting data. However, if your files do contain personal data, remove the personal information from the header of the raw data files before sharing.

RDM tool for de-identifying EDF data:

EDF De-identification tool, open source tool (standalone or code) to de-identify your EEG data.
Check if your processing software provides the option to de-identify the dataset.

Note: EEG, heart rate, skin conductance data, reaction times to a task etc., are considered personal data when there is also information included in the dataset that links this data to an identified or identifiable natural person.

Virtual reality / Eye tracking data

Raw eye-tracking data contains videos of facial features and is, therefore, a dataset that can be linked to an identified or identifiable natural person.

These data should be for your eyes only and can pose an increased privacy risk.

Data breach (e.g., loss of data, unauthorized access)

A data breach is a security incident that affects the confidentiality, integrity, or availability of personal data. Possible incidents that can lead to a data breach are:

Access to personal data by an unauthorized third party;
intentional or unintentional action that affects the security of personal data;
sending personal data to an incorrect recipient;
computer equipment with personal data that is lost or stolen;
changing personal data without consent;

Did you encounter a data breach?

If you identify a data breach, you must always report it via the UHasselt Self Service Desk. Together with the Data Protection Officer, it will be examined whether there is actually a data breach that must be reported to the Data Protection Authority. If the data breach entails privacy risks, the data subjects must also be informed of this.

Security

Switch on Security!