دورية أكاديمية

The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

التفاصيل البيبلوغرافية
العنوان: The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.
المؤلفون: Vrbik, Irene1 irene.vrbik@mcgill.ca, Stephens, David A.1, Roger, Michel2, Brenner, Bluma G.3,4
المصدر: BMC Bioinformatics. 11/7/2015, Vol. 16, p1-9. 9p. 6 Charts, 3 Graphs.
مصطلحات موضوعية: *PHYLOGENY, *HIV, *NUCLEOTIDE sequence, *CLUSTER analysis (Statistics), *GENETIC distance
مستخلص: Background: In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. Results: This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Conclusions: Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease. [ABSTRACT FROM AUTHOR]
قاعدة البيانات: Academic Search Index
الوصف
تدمد:14712105
DOI:10.1186/s12859-015-0791-x