Skip to content

Code run to analyze the Data Quality (DQ) gathered within the CANCERLESS project

Notifications You must be signed in to change notification settings

bdslab-upv/dq_cancerless_analysis

Repository files navigation

Multi-source coherence analysis of the first European multi-centre cohort study for cancer prevention in people experiencing homelessness: a data quality study

Overview

This repository contains the scripts and notebooks used to analyze the data from the CANCERLESS project. The analysis have focused in the data quality aspect, more specifically, in the multi-source variability.

Abstract

Background and objective: People experiencing homelessness (PEH) face significant health challenges and disparities in healthcare access due to barriers such as unstable housing, limited resources, and social stigma. In response, the European Union has initiated efforts to address these disparities. The CANCERLESS project, part of this initiative, has created the first European multi-centre dataset for cancer prevention in PEH. This work aims to evaluate and describe the heterogeneity of PEH across pilot sites and to provide data quality metrics for reliable future research. Methods: The dataset comprises 652 cases: 142 from Vienna, 158 from Athens and Thessaloniki, 197 from Madrid, and 155 from the United Kingdom. All participants fit classifications from the European Typology of Homelessness and Housing Exclusion. This longitudinal study collected questionnaires at baseline, four weeks, and at the end of the intervention. The 180-question survey covered socio-demographic data, overall health, mental health, empowerment, and interpersonal communication. Data variability was assessed using information theory and geometric methods to analyse discrepancies in distributions and completeness across the dataset. Results: Significant variability was found among the four pilot countries, both overall and within specific sections, except for the health section. Madrid showed the largest discrepancies, with a high number of missing values related to interpersonal communication and healthcare service use. Conclusion: Health data may be comparable across the four countries, but further analysis should account for location-specific differences. This study underscores the heterogeneity among PEH and the critical need for data quality assessments to inform future research and policymaking in this field.

Code by: Antonio Blasco-Calafat & Vicent Blanes-Selva

About

Code run to analyze the Data Quality (DQ) gathered within the CANCERLESS project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published