Background
The best sampled and most continuous part of the Koobi Fora Fm is from the upper Burgi Mb to the Okote Mb, i.e.,
from about 2 Ma to 1.4 Ma. The fossil record of the upper Burgi, KBS, and Okote Mbs has been more thoroughly
studied and published, and here we present new analyses of faunal associations based on these upper members.
Ever since the first coefficient of association between species was devised by pioneer ecologist Stephen Alfred
Forbes (Forbes, 1907), numerous quantitative metrics have been defined, aiming at clarifying how species combine
themselves into larger communities (Dice, 1945). We used a dataset containing mammalian genera across the
upper Burgi, KBS, and Okote Mbs to detect patterns of association in relation to taxonomic abundance data. All
extremely rare genera (N ≤ 10) were excluded from the following analysis. To reaveal faunal patterns, we employed
APRIORI (Agrawal et al., 1993), an if A then B algorithm for mining association rules, typically used to analyse
transaction data for retail markets and online e-commerce stores (Hahsler, 2017; Hahsler and Karpienko, 2017).
Instead of looking at transactions per se, we transformed our count data (a proxy for the abundance of taxa) of
genera across geological members into “association” data. The data transformation involved three steps: first
the dataset was double-standardized per $max$ value of each column (member) and per the $sum$ of each row (genus)
(Legendre and Gallagher, 2001); second, an euclidean distance matrix $D_{mn}$ was calculated; and third, a matrix $X_{ij}$
where each value $ij$ is a binarized logical solution for $D_{mn}<\overline{D}_n$. This Boolean matrix $ X_{ij} $
can then be fed into the APRIORI algorithm to understand associations $\text{if i then j}$ between any paleotaxa $i$ and $j$.
Then, the associations calculated were analysed, scored and ranked by the following thresholds:
Here we introduce FARUBO, a flexible web application for rule-based learning and visualization of paleofaunal
associations, available through the osteomics web-platform. FARUBO was fully developed in R using shiny, arules
and arulesViz packages (Hahsler, 2017; Hahsler et al., 2011; Hahsler et al., 2005; R Core Team, 2019). FARUBO is
designed with a side panel for interactive functionalities and a main panel with 3 menu tabs: “Data Exploration”,
“Paleofaunal Network”, and “Clustered Rules”. The side panel allows users to control interactively all parameters
as minimum thresholds (eq. 1,2,3). The fourth parameter of the side panel “Rules length” allows one to define the
number of taxa in the left hand side (LHS) of the if-then rule, while the last parameters are all related to
filtering taxa for the analyses. Regarding the main panel, the first tab “Data Exploration” is the landpage; it
summarizes all rules being generated in real-time by the web application and it allows users to download them
anytime as a .csv table. In the “Paleofaunal Network” tab, interactive networks of associations can be visualized;
in the default display, circle size increases with support and circle shading saturates (to red) with confidence,
while the rules’ number decreases with lift. If hundreds or thousands of rules are being generated the graph
visualization gets too convoluted, and therefore users can alternatively use the “Clustered Rules” tab to see a
summarized visualization of the rules. The current version of the webapp loads with the hominins as required RHS
(right hand side) taxa, but this is also an option that can be manipulated in the side panel. Homo associations
tend to rank higher than Paranthropus in terms of Support and Confidence, but lower in terms of Lift. This is due
to Paranthropus being comparatively under-represented in the upper Burgi Mb, which leads to lower expected confidence.
References
Agrawal, R., Imieliński, T., Swami, A., 1993. Mining association rules between sets of items in large databases,
Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data.
ACM, New York, pp. 207–216.
Dice, L.R., 1945. Measures of the amount of ecologic association between species.
Ecology
26, 297–302.
Forbes, S.A., 1907. On the local distribution of certain Illinois fishes: an essay in statistical ecology.
Bulletin of the Illinois State Laboratory of Natural History
7, 273–303.
Hahsler, M., 2017. arulesViz: Interactive Visualization of Association Rules with R.
The R Journal
9, 163-175.
Hahsler, M., Chelluboina, S., Hornik, K., Buchta, C., 2011. The arules R-Package Ecosystem: analyzing interesting patterns from large transaction data sets.
Journal of Machine Learning Research
12, 2021–2025.
Hahsler, M., Grün, B., Hornik, K., 2005. arules - A computational environment for mining association rules and frequent item sets.
2005 14, 25.
Hahsler, M., Karpienko, R., 2017. Visualizing association rules in hierarchical groups.
Journal of Business Economics
87, 317-335.
Legendre, P., Gallagher, E.D., 2001. Ecologically meaningful transformations for ordination of species data.
Oecologia
129, 271-280.
R Core Team, 2019. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Viena.