Reconstructing historical populations from genealogical data: an overview of methods used for aggregating data from GEDCOM files

التفاصيل البيبلوغرافية
العنوان: Reconstructing historical populations from genealogical data: an overview of methods used for aggregating data from GEDCOM files
المؤلفون: Gellatly, C.
المساهمون: Sociaal-economische geschiedenis, LS Economische Geschiedenis, OGKG - Sociaal-economische geschiedenis
سنة النشر: 2014
الوصف: The GEDCOM file format is by far the most widely used means of exchanging genealogical data and extensive collections of these files are available online. There is a huge potential bene-fit for historians and other academics who are able to make use of the data contained in availa-ble GEDCOM files, as these effectively repre-sent hundreds of thousands of hours of crowd-sourced work and a considerable source of knowledge about individual families. This paper details a number of methods that are being used to clean and aggregate such genealogical data; this includes a series of steps for screening out substantially flawed files, as well as for cleaning date and place information. A group-linking method is described for identifying duplicates / linkages within a genealogical database based on comparison of family structures. This is tested alongside conventional methods (i.e. comparison of name and birth date) and an estimation of the power of the differing methods is provided. It is proposed that use of the group-linking method provides advantages over conventional methods, because this provides a way of increasing the size and timespan of datasets that may be ex-tracted from a genealogical database with confi-dence that they do not contain duplicates. The method will be further improved by incorporat-ing probabilistic record linkage techniques, which take into account the frequencies of val-ues in the linkage arrays.
نوع الوثيقة: report
وصف الملف: image/pdf
اللغة: English
العلاقة: https://dspace.library.uu.nl/handle/1874/306153Test
الإتاحة: https://dspace.library.uu.nl/handle/1874/306153Test
حقوق: info:eu-repo/semantics/OpenAccess
رقم الانضمام: edsbas.E49BCCCE
قاعدة البيانات: BASE