دورية أكاديمية

A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers

التفاصيل البيبلوغرافية
العنوان: A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers
المؤلفون: Jiao, Yue, Lesueur, Fabienne, Azencott, Chloé-Agathe, Laurent, Maïté, Mebirouk, Noura, Laborde, Lilian, Beauvallet, Juana, Dondon, Marie-Gabrielle, Eon-Marchais, Séverine, Laugé, Anthony, Noguès, Catherine, C., Andrieu, Nadine, Stoppa-Lyonnet, Dominique, Caputo, Sandrine, M, Boutry-Kryza, Nadia, Calender, Alain, Giraud, Sophie, Léone, Mélanie, Bressac- de Paillerets, Brigitte, Caron, Olivier, Guillaud-Bataille, Marine, Bignon, Yves-Jean, Uhrhammer, Nancy, Bonadona, Valérie, Lasset, Christine, Berthet, Pascaline, Castera, Laurent, Vaur, Dominique, Bourdon, Violaine, Noguchi, Tetsuro, Popovici, Cornel, Remenieras, Audrey, Sobol, Hagay, Coupier, Isabelle, Harmand, Pierre-Olivier, Pujol, Pascal, Vilquin, Paul, Dumont, Aurélie, Révillion, Françoise, Muller, Danièle, Barouk-Simonet, Emmanuelle, Bonnet, Françoise, Bubien, Virginie, Longy, Michel, Sevenet, Nicolas, Gladieff, Laurence, Guimbaud, Rosine, Feillel, Viviane, Toulas, Christine, Dreyfus, Hélène, Leroux, Dominique, Peysselon, Magalie, Rebischung, Christine, Baurand, Amandine, Bertolone, Geoffrey, Coron, Fanny, Faivre, Laurence, Goussot, Vincent, Jacquot, Caroline, Sawka, Caroline, Kientz, Caroline, Lebrun, Marine, Prieur, Fabienne, Fert-Ferrer, Sandra, Mari, Véronique, Venat-Bouvet, Laurence, Bézieau, Stéphane, Delnatte, Capucine, Mortemousque, Isabelle, Coulet, Florence, Soubrier, Florent, Warcoin, Mathilde, Bronner, Myriam, Lizard, Sarab, Sokolowska, Johanna, Collonge-Rame, Marie-Agnès, Damette, Alexandre, Gesta, Paul, Lallaoui, Hakima, Chiesa, Jean, Molina-Gomes, Denise, Ingster, Olivier, Manouvrier-Hanu, Sylvie, Lejeune, Sophie, Pontois, Pauline, Lyonnet, Dominique Stoppa, Gauthier-Villars, Marion, Buecher, Bruno, Mouret-Fourme, Emmanuelle, Fricker, Jean-Pierre, Luporsi, Elisabeth, Frenay, Marc, Eisinger, Francois, Moretta, Jessica, Dugast, Catherine, Colas, Chrystelle, Lortholary, Alain, Vennin, Philippe, Adenis, Claude, Nguyen, Tan Dat, Rossi, Annick, Tinat, Julie, Tennevet, Isabelle, Limacher, Jean-Marc, Maugard, Christine, Bignon, Jean-Yves, Demange, Liliane, Cohen-Haguenauer, Odile, Gilbert, Brigitte, Zattara-Cannoni, Hélène
المساهمون: Institut Curie Paris, Université Paris Sciences et Lettres (PSL), Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, Mines Paris - PSL (École nationale supérieure des mines de Paris), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut Curie Paris -Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Bioinformatique (CBIO), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL), Institut Paoli-Calmettes (IPC), Fédération nationale des Centres de lutte contre le Cancer (FNCLCC), Unité de génétique et biologie des cancers (U830), Institut Curie Paris -Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Descartes - Paris 5 (UPD5), Aix Marseille Université (AMU), Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale (SESSTIM - U1252 INSERM - Aix Marseille Univ - UMR 259 IRD), Institut de Recherche pour le Développement (IRD)-Aix Marseille Université (AMU)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Cité (UPCité), Centre Hospitalier Régional Universitaire Montpellier (CHRU Montpellier), Maladies infectieuses et vecteurs : écologie, génétique, évolution et contrôle (MIVEGEC), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD France-Sud ), Nadia Boutry-Kryza, Alain Calender, Sophie Giraud, Mélanie Léone, Brigitte Bressac-de-Paillerets, Olivier Caron, Marine Guillaud-Bataille, Yves-Jean Bignon, Nancy Uhrhammer, Valérie Bonadona, Christine Lasset, Pascaline Berthet, Laurent Castera, Dominique Vaur, Violaine Bourdon, Catherine Noguès, Tetsuro Noguchi, Cornel Popovici, Audrey Remenieras, Hagay Sobol, Isabelle Coupier, Pierre-Olivier Harmand, Pascal Pujol, Paul Vilquin, Aurélie Dumont, Françoise Révillion, Danièle Muller, Emmanuelle Barouk-Simonet, Françoise Bonnet, Virginie Bubien, Michel Longy, Nicolas Sévenet, Laurence Gladieff, Rosine Guimbaud, Viviane Feillel, Christine Toulas, Hélène Dreyfus, Dominique Leroux, Magalie Peysselon, Christine Rebischung, Amandine Baurand, Geoffrey Bertolone, Fanny Coron, Laurence Faivre, Vincent Goussot, Caroline Jacquot, Caroline Sawka, Caroline Kientz, Marine Lebrun, Fabienne Prieur, Sandra Fert-Ferrer, Véronique Mari, Laurence Vénat-Bouvet, Stéphane Bézieau, Capucine Delnatte, Isabelle Mortemousque, Florence Coulet, Florent Soubrier, Mathilde Warcoin, Myriam Bronner, Sarab Lizard, Johanna Sokolowska, Marie-Agnès Collonge-Rame, Alexandre Damette, Paul Gesta, Hakima Lallaoui, Jean Chiesa, Denise Molina-Gomes, Olivier Ingster, Sylvie Manouvrier-Hanu, Sophie Lejeune, Catherine Noguès, Lilian Laborde, Pauline Pontois, Dominique Stoppa-Lyonnet, Marion Gauthier-Villars, Bruno Buecher, Olivier Caron, Emmanuelle Mouret-Fourme, Jean-Pierre Fricker, Christine Lasset, Valérie Bonadona, Pascaline Berthet, Laurence Faivre, Elisabeth Luporsi, Marc Frénay, Laurence Gladieff, Paul Gesta, Hagay Sobol, François Eisinger, Jessica Moretta, Michel Longy, Catherine Dugast, Chrystelle Colas, Florent Soubrier, Isabelle Coupier, Pascal Pujol, Alain Lortholary, Philippe Vennin, Claude Adenis, Tan Dat Nguyen, Capucine Delnatte, Annick Rossi, Julie Tinat, Isabelle Tennevet, Jean-Marc Limacher, Christine Maugard, Yves-Jean Bignon, Liliane Demange, Hélène Dreyfus, Odile Cohen-Haguenauer, Brigitte Gilbert, Dominique Leroux, Hélène Zattara-Cannoni
المصدر: ISSN: 1471-2288 ; BMC Medical Research Methodology ; https://inserm.hal.science/inserm-03313811Test ; BMC Medical Research Methodology, 2021, 21 (1), pp.155. ⟨10.1186/s12874-021-01299-6⟩.
بيانات النشر: HAL CCSD
BioMed Central
سنة النشر: 2021
مصطلحات موضوعية: Hybrid process, Probabilistic linkage, Record linkage, Supervised machine learning, [SDV]Life Sciences [q-bio], [SDV.MHEP]Life Sciences [q-bio]/Human health and pathology
الوصف: International audience ; Background: Linking independent sources of data describing the same individuals enable innovative epidemiological and health studies but require a robust record linkage approach. We describe a hybrid record linkage process to link databases from two independent ongoing French national studies, GEMO (Genetic Modifiers of BRCA1 and BRCA2), which focuses on the identification of genetic factors modifying cancer risk of BRCA1 and BRCA2 mutation carriers, and GENEPSO (prospective cohort of BRCAx mutation carriers), which focuses on environmental and lifestyle risk factors.Methods: To identify as many as possible of the individuals participating in the two studies but not registered by a shared identifier, we combined probabilistic record linkage (PRL) and supervised machine learning (ML). This approach (named "PRL + ML") combined together the candidate matches identified by both approaches. We built the ML model using the gold standard on a first version of the two databases as a training dataset. This gold standard was obtained from PRL-derived matches verified by an exhaustive manual review. Results The Random Forest (RF) algorithm showed a highest recall (0.985) among six widely used ML algorithms: RF, Bagged trees, AdaBoost, Support Vector Machine, Neural Network. Therefore, RF was selected to build the ML model since our goal was to identify the maximum number of true matches. Our combined linkage PRL + ML showed a higher recall (range 0.988-0.992) than either PRL (range 0.916-0.991) or ML (0.981) alone. It identified 1995 individuals participating in both GEMO (6375 participants) and GENEPSO (4925 participants).Conclusions: Our hybrid linkage process represents an efficient tool for linking GEMO and GENEPSO. It may be generalizable to other epidemiological studies involving other databases and registries.
نوع الوثيقة: article in journal/newspaper
اللغة: English
العلاقة: info:eu-repo/semantics/altIdentifier/pmid/34325649; inserm-03313811; https://inserm.hal.science/inserm-03313811Test; https://inserm.hal.science/inserm-03313811/documentTest; https://inserm.hal.science/inserm-03313811/file/s12874-021-01299-6.pdfTest; PUBMED: 34325649; PUBMEDCENTRAL: PMC8320036; WOS: 000680912600003
DOI: 10.1186/s12874-021-01299-6
الإتاحة: https://doi.org/10.1186/s12874-021-01299-6Test
https://inserm.hal.science/inserm-03313811Test
https://inserm.hal.science/inserm-03313811/documentTest
https://inserm.hal.science/inserm-03313811/file/s12874-021-01299-6.pdfTest
حقوق: http://creativecommons.org/licenses/byTest/ ; info:eu-repo/semantics/OpenAccess
رقم الانضمام: edsbas.4ABDC002
قاعدة البيانات: BASE