Prediction of biogeographical ancestry from genotype: a comparison of classifiers

Publication Type:
Journal Article
International Journal of Legal Medicine, 2017, 131 (4), pp. 901 - 912
Issue Date:
Filename Description Size
10.1007%2Fs00414-016-1504-3.pdfPublished Version880.68 kB
Adobe PDF
Full metadata record
© 2016, Springer-Verlag Berlin Heidelberg. DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.
Please use this identifier to cite or link to this item: