In forensic anthropology, ancestry estimation is essential in establishing the individual biological profile. The aim of this study is to present a new program-AncesTrees-developed for assessing ancestry based on metric analysis. AncesTrees relies on a machine learning ensemble algorithm, random forest, to classify the human skull. In the ensemble learning paradigm, several models are generated and co-jointly used to arrive at the final decision. The random forest algorithm creates ensembles of decision trees classifiers, a non-linear and non-parametric classification technique. The database used in AncesTrees is composed by 23 craniometric variables from 1,734 individuals, representative of six major ancestral groups and selected from the Howells' craniometric series. The program was tested in 128 adult crania from the following collections: the African slaves' skeletal collection of Valle da Gafaria; the Medical School Skull Collection and the Identified Skeletal Collection of 21st Century, both curated at the University of Coimbra. The first step of the test analysis was to perform ancestry estimation including all the ancestral groups of the database. The second stage of our test analysis was to conduct ancestry estimation including only the European and the African ancestral groups. In the first test analysis, 75 % of the individuals of African ancestry and 79.2 % of the individuals of European ancestry were correctly identified. The model involving only African and European ancestral groups had a better performance: 93.8 % of all individuals were correctly classified. The obtained results show that AncesTrees can be a valuable tool in forensic anthropology.
Greatest length, from the glabellar region, in the median sagittal plane.
2. NOL Nasio-occipital lengthGreatest cranial length in the median sagittal plane, measured from nasion.
3. BBH Basion-bregma heightDistance from basion to bregma, as defined.
4. XCB Maximum cranial breadthThe maximum cranial breadth perpendicular to the median sagittal plane, above the supramastoid crests.
5. XFB Maximum frontal breadthThe maximum breadth at the coronal suture, perpendicular to the medial plane.
6. FMB Bifrontal breadthThe breadth across the frontal bone between frontomalare anterior on each side, i.e., the most anterior point on the fronto-malar suture.
7. ZYB Bizygomatic breadthThe direct distance between both zigya located at their most lateral points of the zygomatic arches.
8. AUB Biauricular breadthThe least exterior breadth across the roots of the zygomatic processes, wherever found.
9. MAB Palate breadth, externalThe greatest breadth across the alveolar borders, wherever found, perpendicular to the median plane.
10. ASB Biasterionic breadthDirect measurement from one asterion to the other.
11. JUB Bijugal breadthThe external breadth across the malars at the jugalia, i.e., at the deepest points in the curvature between the frontal and temporal process of the malars.
12. ZMB Bimaxillary breadthThe breadth across the maxillae, from one zygomaxillare [anterior] to the other.
13. WMH Cheek heightThe minimum distance, in any direction, from the lower border of the orbit to the lower margin of the maxilla, mesial to the masseter attachment, on the left side.
14. NPH Nasion-prosthion heightUpper facial height from nasion to prosthion, as defined.
15. BPL Basion-prosthion lengthThe facial length from basion to prosthion, as defined.
Direct length between basion and nasion.
17. NLH Nasal heightThe average height from nasion to the lowest point on the border of the nasal aperture on either side.
18. NLB Nasal breadthThe distance between the anterior edges of the nasal aperture at its widest extent.
19. EKB Biorbital breadthThe breadth across the orbits from ectoconchion to ectoconchion.
20. DKB Interorbital breadthThe breadth across the nasal space from dacryon to dacryon.
21. OBH Orbit height, leftThe height between the upper and lower borders of the left orbit, perpendicular to the long axis of the orbit and bisecting it.
22. OBB Orbit breadth, leftBreadth from ectoconchion to dacryon, as defined, approximating the longitudinal axis which bisects the orbit into equal upper and lower parts.
23. FRC Nasion-bregma chord, Frontal chordThe frontal chord, or direct distance from nasion to bregma, taken in the midplane and at the external surface.
24. PAC Bregma-lambda chord, Parietal chordThe external parietal chord, or direct distance from bregma to lambda, taken in the midplane and at the external surface.
25. OCC Lambda-opisthion chord, Occipital chordThe external occipital chord, or direct distance from lambda to opisthion, taken in the midplane and at the external surface.
26. SSS Zygomaxillary subtenseThe projection or subtense from subspinale to the bimaxillary width [ZMB].
27. NAS Nasio-frontal subtenseThe subtense from nasion to the bifrontal breadth.
28. FRS Nasion-bregma subtense, Frontal subtenseThe maximum subtense, at the highest point on the convexity of the frontal bone in the midplane, to the nasion-bregma chord.
29. PAS Bregma-lambda subtense, Parietal subtenseThe maximum subtense, at the highest point on the convexity of the parietal bones in the midplane, to the bregma-lambda chord.
30. OCS Lambda-opisthion subtense, Occipital subtenseThe maximum subtense, at the most prominent point on the basic contour of the occipital bone in the midplane.
tournamentForest implements a recursive full elimination round-robin tournament classification algorithm built upon randomForest classifiers using LDA projected predictors.
The algorithm needs at least 3 groups to run, and automatically selects the best binary classifier given the data inputed by the user.
It follows a divide-and-conquer approach, where in each iteration of the tournament the least likely ancestral group is discarded as viable hypothesis. The tournament is finished when only two ancestral groups remain in "competition".
The algorithm explores an hypothesis space composed of $\frac{N(N-1)}{2}$ classifiers performing every possible pairwise comparison between N ancestral groups in order to establish the most likely one. This algorithm is best suited for cases where little to no background knowledge on a possible ancestry is available.
tournamentForest is set as the default algorithm because it represents a fully automated and data-driven approach to bio-geographic ancestry prediction.