Background
The best sampled and most continuous part of the Koobi Fora Fm is from the upper Burgi Mb to the Okote Mb, i.e., from about 2 Ma to 1.4 Ma. The fossil record of the upper Burgi, KBS, and Okote Mbs has been more thoroughly studied and published, and here we present new analyses of faunal associations based on these upper members. Ever since the first coefficient of association between species was devised by pioneer ecologist Stephen Alfred Forbes (Forbes, 1907), numerous quantitative metrics have been defined, aiming at clarifying how species combine themselves into larger communities (Dice, 1945). We used a dataset containing mammalian genera across the upper Burgi, KBS, and Okote Mbs to detect patterns of association in relation to taxonomic abundance data. All extremely rare genera (N ≤ 10) were excluded from the following analysis. To reaveal faunal patterns, we employed APRIORI (Agrawal et al., 1993), an if A then B algorithm for mining association rules, typically used to analyse transaction data for retail markets and online e-commerce stores (Hahsler, 2017; Hahsler and Karpienko, 2017). Instead of looking at transactions per se, we transformed our count data (a proxy for the abundance of taxa) of genera across geological members into “association” data. The data transformation involved three steps: first the dataset was double-standardized per $max$ value of each column (member) and per the $sum$ of each row (genus) (Legendre and Gallagher, 2001); second, an euclidean distance matrix $D_{mn}$ was calculated; and third, a matrix $X_{ij}$ where each value $ij$ is a binarized logical solution for $D_{mn}<\overline{D}_n$. This Boolean matrix $ X_{ij} $ can then be fed into the APRIORI algorithm to understand associations $\text{if i then j}$ between any paleotaxa $i$ and $j$. Then, the associations calculated were analysed, scored and ranked by the following thresholds:
Here we introduce FARUBO, a flexible web application for rule-based learning and visualization of paleofaunal associations, available through the osteomics web-platform. FARUBO was fully developed in R using shiny, arules and arulesViz packages (Hahsler, 2017; Hahsler et al., 2011; Hahsler et al., 2005; R Core Team, 2019). FARUBO is designed with a side panel for interactive functionalities and a main panel with 3 menu tabs: “Data Exploration”, “Paleofaunal Network”, and “Clustered Rules”. The side panel allows users to control interactively all parameters as minimum thresholds (eq. 1,2,3). The fourth parameter of the side panel “Rules length” allows one to define the number of taxa in the left hand side (LHS) of the if-then rule, while the last parameters are all related to filtering taxa for the analyses. Regarding the main panel, the first tab “Data Exploration” is the landpage; it summarizes all rules being generated in real-time by the web application and it allows users to download them anytime as a .csv table. In the “Paleofaunal Network” tab, interactive networks of associations can be visualized; in the default display, circle size increases with support and circle shading saturates (to red) with confidence, while the rules’ number decreases with lift. If hundreds or thousands of rules are being generated the graph visualization gets too convoluted, and therefore users can alternatively use the “Clustered Rules” tab to see a summarized visualization of the rules. The current version of the webapp loads with the hominins as required RHS (right hand side) taxa, but this is also an option that can be manipulated in the side panel. Homo associations tend to rank higher than Paranthropus in terms of Support and Confidence, but lower in terms of Lift. This is due to Paranthropus being comparatively under-represented in the upper Burgi Mb, which leads to lower expected confidence.
References

Agrawal, R., Imieliński, T., Swami, A., 1993. Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. ACM, New York, pp. 207–216.

Dice, L.R., 1945. Measures of the amount of ecologic association between species. Ecology 26, 297–302.

Forbes, S.A., 1907. On the local distribution of certain Illinois fishes: an essay in statistical ecology. Bulletin of the Illinois State Laboratory of Natural History 7, 273–303.

Hahsler, M., 2017. arulesViz: Interactive Visualization of Association Rules with R. The R Journal

9, 163-175.

Hahsler, M., Chelluboina, S., Hornik, K., Buchta, C., 2011. The arules R-Package Ecosystem: analyzing interesting patterns from large transaction data sets. Journal of Machine Learning Research 12, 2021–2025.

Hahsler, M., Grün, B., Hornik, K., 2005. arules - A computational environment for mining association rules and frequent item sets. 2005 14, 25.

Hahsler, M., Karpienko, R., 2017. Visualizing association rules in hierarchical groups. Journal of Business Economics

87, 317-335.

Legendre, P., Gallagher, E.D., 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129, 271-280.

R Core Team, 2019. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Viena.