Latest papers

6 papers
survey arXiv Feb 6, 2026 · 8w ago

Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski et al. · IARPA · NIST +13 more

Surveys IARPA TrojAI program findings on AI backdoor detection via weight analysis and trigger inversion across multi-year research

Model Poisoning visionnlp
PDF
benchmark arXiv Jan 20, 2026 · 10w ago

Vulnerability of LLMs' Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions

Fan Huang, Haewoon Kwak, Jisun An · Indiana University Bloomington

Benchmarks LLM belief robustness against multi-turn persuasive prompt manipulation using an SMCR communication framework across five models

Prompt Injection nlp
PDF
attack arXiv Nov 8, 2025 · Nov 2025

IndirectAD: Practical Data Poisoning Attacks against Recommender Systems for Item Promotion

Zihao Wang, Tianhao Mao, XiaoFeng Wang et al. · Nanyang Technological University · Indiana University Bloomington +2 more

Data poisoning attack uses trigger-item co-occurrence strategy to promote target items in recommender systems with only 0.05% fake users

Data Poisoning Attack tabular
PDF
defense arXiv Sep 8, 2025 · Sep 2025

Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

Yan Pang, Wenlong Meng, Xiaojing Liao et al. · University of Virginia · Indiana University Bloomington

Instruments LLMs with trigger-tag associations so phishing-generating models automatically embed detectable markers in harmful outputs

Output Integrity Attack Prompt Injection nlp
PDF
defense arXiv Aug 11, 2025 · Aug 2025

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Kexin Chu, Zecheng Lin, Dawei Xiang et al. · University of Connecticut · Tsinghua University +3 more

Defends multi-tenant LLM inference from timing side-channels that leak user queries via KV-cache hit/miss timing differences

Sensitive Information Disclosure nlp
PDF Code
defense arXiv Jan 9, 2025 · Jan 2025

RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models

Peizhuo Lv, Mengjie Sun, Hao Wang et al. · Chinese Academy of Sciences · Shandong University +2 more

Embeds 'knowledge watermarks' into RAG document stores to detect IP theft of retrieval-augmented LLM systems via black-box querying

Model Theft nlp
PDF