ML Security Papers

Latest papers

6 papers

defense arXiv Apr 27, 2026 · 24d ago

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Jiaqi Li, Yang Zhao, Bin Sun et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +3 more

Self-play security training framework teaching AI agents to detect prompt injection, memory poisoning, and supply-chain attacks via role alternation

AI Supply Chain Attacks Prompt Injection Excessive Agency Blue-Team Agents nlp

PDF

benchmark arXiv Feb 9, 2026 · Feb 2026

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Yuhang Wang, Feiming Xu, Zheng Lin et al. · Xidian University · China Unicom

Benchmarks real-world personalized LLM agent security across prompt injection, tool misuse, and memory poisoning attack vectors

Prompt Injection Insecure Plugin Design Excessive Agency nlp

PDF Code

attack arXiv Jan 20, 2026 · Jan 2026

When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models

Ruihan Hu, Yu-Ming Shang, Wei Luo et al. · Beijing University of Posts and Telecommunications · China Unicom

Exploits exposed reasoning traces in black-box LRMs to launch membership inference attacks without logit access

Membership Inference Attack Sensitive Information Disclosure nlp

PDF Code

defense arXiv Oct 30, 2025 · Oct 2025

SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification

Yingjia Wang, Ting Qiao, Xing Liu et al. · North China Electric Power University · China Unicom +1 more

Embeds sample-specific backdoor watermarks in training data to prove dataset ownership via black-box model testing

Output Integrity Attack vision

1 citations 1 influentialPDF

defense arXiv Oct 17, 2025 · Oct 2025

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

Ting Qiao, Xing Liu, Wenke Huang et al. · North China Electric Power University · China Unicom +3 more

Certifiably robust training-data watermarking for PLMs using dual-space smoothing to verify dataset ownership under adversarial perturbations

Output Integrity Attack nlp

1 citations PDF Code

defense arXiv Sep 16, 2025 · Sep 2025

Hierarchical Deep Fusion Framework for Multi-dimensional Facial Forgery Detection - The 2024 Global Deepfake Image Detection Challenge

Kohou Wang, Huan Hu, Xiang Liu et al. · China Unicom

Ensemble framework fusing four vision transformers for deepfake face detection, achieving top-20 finish in a 184-team competition

Output Integrity Attack visiongenerative

PDF

Latest papers

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models

SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

Hierarchical Deep Fusion Framework for Multi-dimensional Facial Forgery Detection - The 2024 Global Deepfake Image Detection Challenge

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue