Patient-centric synthetic data generation, no reason to risk re-identification in the analysis of biomedical pseudonymised data

التفاصيل البيبلوغرافية
العنوان: Patient-centric synthetic data generation, no reason to risk re-identification in the analysis of biomedical pseudonymised data
المؤلفون: Morgan Guillaudeux, Olivia Rousseau, Julien Petot, Zineb Bennis, Charles-Axel Dein, Thomas Goronflot, Matilde Karakachoff, Sophie Limou, Nicolas Vince, Matthieu Wargny, Pierre-Antoine Gourraud
بيانات النشر: Research Square Platform LLC, 2022.
سنة النشر: 2022
الوصف: Anonymization is crucial in the era of big data analysis. While nearly all computational methods operate on pseudonymised personal data, re-identification remains a risk. With personal health data, this re-identification risk may be considered a double-crossing of patients’ trust. Herein, we present a new method to generate synthetic data of individual granularity while holding on to patients’ privacy. Developed for sensitive biomedical data, the method is patient-centric as it uses a local model to generate random new synthetic data, called an “avatar”, for each initial sensitive individual. This method is applied to real health data in a clinical trial related to HIV-infected patients (AIDS clinical trial; N=2139, 26 variables, example 1) and the Wisconsin Breast Cancer Diagnosis (WBCD; N=683, 1 10 variables, example 2) observational study to evaluate the protection it provides while retaining the original statistical information. In the light of distance-based privacy metrics, each individual produces an avatar that is on average indistinguishable from 12 other generated avatars for the AIDS clinical trial and 24 for the WBCD observational study. Data transformation via the Avatar method preserved the evaluation of the treatment’s effectiveness for example 1 (original hazard ratio 𝐻𝑅 = 0. 49 (95% CI, 0.39 - 0.63) vs avatar 𝐻𝑅 = 0. 47 (95% CI, 0.37 - 0.60)) and the classification properties for example 2 (original 𝐴𝑈𝐶 = 99. 46 (𝑠𝑡𝑑 = 0. 25) vs avatar 𝐴𝑈𝐶 = 99. 84 (𝑠𝑡𝑑 = 0. 12)). Thanks to the Avatar method, modern-era data analysis should no longer pose a re-identification risk. Avatars enable the creation of value from pseudonymised data analyses by tackling the risk of a privacy breach.
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_________::5416eaec4a4411548c2b93f03f9222b7Test
https://doi.org/10.21203/rs.3.rs-1674043/v1Test
حقوق: OPEN
رقم الانضمام: edsair.doi...........5416eaec4a4411548c2b93f03f9222b7
قاعدة البيانات: OpenAIRE