Background

We applied the APRIORI algorithm to assess chimpanzee dietary combinations (Agrawal & Srikant, 1994). This method identifies association rules within a large dataset, generating rules with support and confidence exceeding user-specified thresholds (Agrawal & Srikant, 1994; Al-Maolegi & Arkok, 2014). Originally designed for marketing, APRIORI analyzes transaction histories, suggesting additional products to customers (Hahsler, 2017; Hahsler & Karpienko, 2017). Unprecedentedly, we adapted APRIORI to explore nonhuman feeding behavior, providing a fresh approach to testing food resource associations. Merging feeding data from a 4-month period for efficiency, we formatted it akin to the collocation analysis V1 subset. Using the transactions() function from the arules package (Hornik et al., 2005), we transformed the long-form dataset into a Binary Incidence Matrix—ideal for mining associations. Finally, the dataset underwent APRIORI analysis in R (version 4.3.2, R Development Core Team, 2024).

Understanding the results hinges on three customizable metrics: support, confidence, and lift (see Supporting Information S1: Figure 1). Support quantifies the frequency of the association, acting as a popularity metric. In diverse datasets like ours, support tends to be low due to the multitude of item-types. Confidence, scaled between 0 and 1, reflects association strength, with 0 as 0% and 1 as 100%. However, it can be influenced by dataset size; for instance, a rare combination may yield a high confidence. To mitigate this, we turn to the crucial Lift metric, which controls for confidence, especially in smaller datasets. A lift >1 indicates a confidence value exceeding the expected, suggesting a non-random association. This metric proves invaluable in scenarios of low frequency and short data collection spans. Lift, a key indicator, indirectly addresses factors like data collection duration. It is particularly useful in larger datasets with sparse observations for each item or combination. The rule of thumb: Lift should be >1 for confidence to be considered a reliable metric.

References

Agrawal, R., Srikant, R. 1994. Fast algorithms for mining association rules in large databases, Proceedings of the 20th International Conference on Very Large Data Bases (pp. 487–499). Morgan Kaufmann Publishers Inc.

Al-Maolegi, M., & Arkok, B. (2014). An improved Apriori algorithm for association rules. ArXiv Preprint ArXiv:1403.3948.

Hahsler, M., 2017. arulesViz: Interactive Visualization of Association Rules with R. The R Journal

9, 163-175.

Hahsler, M., Chelluboina, S., Hornik, K., Buchta, C., 2011. The arules R-Package Ecosystem: analyzing interesting patterns from large transaction data sets. Journal of Machine Learning Research 12, 2021–2025.

Hahsler, M., Karpienko, R., 2017. Visualizing association rules in hierarchical groups. Journal of Business Economics

87, 317-335.

Hornik, K., Grün, B., Hahsler, M. 2005. arules-A computational environment for mining association rules and frequent item sets Journal of Statistical Software

14(15), 1–25.

R Core Team, 2024. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Viena.