Comprehensive Analysis of Provenance-Based APT Detection: An Evaluation-First Modeling Perspective

İpekbayrak, Mustafa; Gürkaş Aydın, GÜLSÜM

doi:10.33187/jmsm.1825484

Comprehensive Analysis of Provenance-Based APT Detection: An Evaluation-First Modeling Perspective

İpekbayrak M., Gürkaş Aydın G. Z.

Journal of mathematical sciences and modelling (Online), sa.Advanced Online Publication, ss.13-26, 2026 (TRDizin)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.33187/jmsm.1825484
Dergi Adı: Journal of mathematical sciences and modelling (Online)
Derginin Tarandığı İndeksler: Central & Eastern European Academic Source (CEEAS), Directory of Open Access Journals, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.13-26
İstanbul Üniversitesi-Cerrahpaşa Adresli: Evet

Özet

Advanced Persistent Threats (APTs) are multi-stage campaigns whose stealthy activity can be surfaced using system-level provenance. Although many provenance-based intrusion detection systems (PIDS) have been proposed, their evaluations remain difficult to compare because studies report results at different granularities, use inconsistent stage language and metrics, and frequently omit the denominators needed to interpret campaign-level claims. This paper presents an evaluation-first perspective on peer-reviewed provenance-based APT detection research by synthesizing 76 studies published in 2017--2025, normalizing analysis and alerting to canonical evidence units (node, subgraph, and graph), aligning stage descriptions to a MITRE ATT&CK--consistent taxonomy, and tracing how methodological choices map to Security Operations Center (SOC) functions and analyst-facing outputs. The synthesis indicates that anomaly-driven learning dominates detection-oriented pipelines, while triage and storyline support center on node-level artifacts in one-twelfth of studies each (within the subset with mappable alert units), and it highlights pervasive reporting gaps in alert units, operational metrics, robustness testing, and end-to-end evaluation assumptions that limit reproducibility and operational interpretation. To enable campaign-level comparability, Campaign Recall (CR) is introduced as a standardized campaign-breadth measure with a reproducible denominator protocol grounded in observable stages derived from documented scenario mappings and an explicit evidence rule. Finally, leakage-aware evaluation guidance, dataset--metric compatibility notes, and a concise reporting checklist are provided to improve comparability and SOC relevance in future provenance-based APT detection studies.