Trojans in Artificial Intelligence (TrojAI) Final Report
Kristopher W. Reese 1, Taylor Kulp-McDowall 1, Michael Majurski 2, Anthony J. Kearsley 2, Tim Blattner 2, Melinda Kleczynski 2, Derek Juba 2, Joel Vasanth 2, Peter Bajcsy 2, Walid Keyrouz 2, Antonio Cardone 2, Philippe Dessauw 2, Chace Ashcraft 3, Alden Dima 3, Neil Fendley 3, Ted Staley 3, David Shriver 4, Trevor Stout 3, Marissa Connor 4, Josh Carney 3, Keltin Grimes 4, Greg Canal 3, Marco Christiani 4, Will Redman 3, Hayden Moore 4, Aurora Schmidt 3, Jordan Widjaja 4, Cameron Hickert 3, Kasimir Gabert 5, William Paul 3, Uma Balakrishnan 5, Jared Markowitz 3, Satyanadh Gundimada 5, Nathan Drenkow 3, John Jacobellis 5, Sandya Lakkur 5, Vitus Leung 5, Jon Roose 5, Guangyu Shen 6, Casey Battaglino 7, Siyuan Cheng 6, Farinaz Koushanfar 8, Shiqing Ma 9, Greg Fields 8, XiaoFeng Wang 10, Xihe Gu 8, Haixu Tang 10, Yaman Jandali 8, Di Tang 10, Xinqiao Zhang 8, Xiaoyi Chen 10, Akash Vartak 11, Zihao Wang 10, Tim Oates 11, Rui Zhu 10, Ben Erichson 12, Susmit Jha 13, Michael Mahoney 12, Xiao Lin 13, Rauf Izmailov 14, Manoj Acharya 13, Xiangyu Zhang 6, Wenchao Li 15
1 IARPA
2 NIST
3 JHU/APL
4 CMU SEI
5 SNL
7 ARM
10 Indiana University Bloomington
11 University of Maryland Baltimore County
12 ICSI
14 Peraton
Published on arXiv
2602.07152
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
The TrojAI program mapped the backdoor threat landscape, pioneered foundational detection methods (weight analysis, trigger inversion), and identified persistent unsolved challenges in AI Trojan defense for deployed models.
The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.
Key Contributions
- Synthesizes multi-year IARPA TrojAI program findings on backdoor detection methodologies including weight analysis and trigger inversion
- Presents comprehensive test and evaluation results for backdoor detector performance, sensitivity, and prevalence of 'natural' Trojans
- Identifies unsolved challenges and provides recommendations for advancing AI security research against trojan threats
🛡️ Threat Analysis
The entire TrojAI program is centered on AI Trojans — malicious hidden backdoors embedded in models that activate on triggers. The report synthesizes detection methods (weight analysis, trigger inversion) and mitigation approaches specifically targeting backdoor/trojan threats.