ML Security Papers

Latest papers

8 papers

benchmark arXiv Apr 23, 2026 · 28d ago

PrivUn: Unveiling Latent Ripple Effects and Shallow Forgetting in Privacy Unlearning

Xiaoyi Chen, Haoyuan Wang, Siyuan Tang et al. · Indiana University Bloomington · Independent Researcher +3 more

Evaluation framework exposing weaknesses in LLM privacy unlearning through three-tier attacks: direct retrieval, in-context recovery, and fine-tuning restoration

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

benchmark arXiv Apr 20, 2026 · 4w ago

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj · Indiana University Bloomington

Compares three LLM jailbreak methods—harmful fine-tuning, RLVR, and abliteration—showing vastly different behavioral and mechanistic failure modes despite similar attack success

Prompt Injection nlp

PDF

survey arXiv Feb 6, 2026 · Feb 2026

Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski et al. · IARPA · NIST +13 more

Surveys IARPA TrojAI program findings on AI backdoor detection via weight analysis and trigger inversion across multi-year research

Model Poisoning visionnlp

PDF

benchmark arXiv Jan 20, 2026 · Jan 2026

Vulnerability of LLMs' Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions

Fan Huang, Haewoon Kwak, Jisun An · Indiana University Bloomington

Benchmarks LLM belief robustness against multi-turn persuasive prompt manipulation using an SMCR communication framework across five models

Prompt Injection nlp

PDF

attack arXiv Nov 8, 2025 · Nov 2025

IndirectAD: Practical Data Poisoning Attacks against Recommender Systems for Item Promotion

Zihao Wang, Tianhao Mao, XiaoFeng Wang et al. · Nanyang Technological University · Indiana University Bloomington +2 more

Data poisoning attack uses trigger-item co-occurrence strategy to promote target items in recommender systems with only 0.05% fake users

Data Poisoning Attack tabular

PDF

defense arXiv Sep 8, 2025 · Sep 2025

Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

Yan Pang, Wenlong Meng, Xiaojing Liao et al. · University of Virginia · Indiana University Bloomington

Instruments LLMs with trigger-tag associations so phishing-generating models automatically embed detectable markers in harmful outputs

Output Integrity Attack Prompt Injection nlp

PDF

defense arXiv Aug 11, 2025 · Aug 2025

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Kexin Chu, Zecheng Lin, Dawei Xiang et al. · University of Connecticut · Tsinghua University +3 more

Defends multi-tenant LLM inference from timing side-channels that leak user queries via KV-cache hit/miss timing differences

Sensitive Information Disclosure nlp

PDF Code

defense arXiv Jan 9, 2025 · Jan 2025

RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models

Peizhuo Lv, Mengjie Sun, Hao Wang et al. · Chinese Academy of Sciences · Shandong University +2 more

Embeds 'knowledge watermarks' into RAG document stores to detect IP theft of retrieval-augmented LLM systems via black-box querying

Model Theft nlp

PDF

Latest papers

PrivUn: Unveiling Latent Ripple Effects and Shallow Forgetting in Privacy Unlearning

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Trojans in Artificial Intelligence (TrojAI) Final Report

Vulnerability of LLMs' Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions

IndirectAD: Practical Data Poisoning Attacks against Recommender Systems for Item Promotion

Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue