Latest papers

5 papers
benchmark arXiv Feb 9, 2026 · 8w ago

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Yuhang Wang, Feiming Xu, Zheng Lin et al. · Xidian University · China Unicom

Benchmarks real-world personalized LLM agent security across prompt injection, tool misuse, and memory poisoning attack vectors

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF Code
attack arXiv Jan 20, 2026 · 10w ago

When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models

Ruihan Hu, Yu-Ming Shang, Wei Luo et al. · Beijing University of Posts and Telecommunications · China Unicom

Exploits exposed reasoning traces in black-box LRMs to launch membership inference attacks without logit access

Membership Inference Attack Sensitive Information Disclosure nlp
PDF Code
defense arXiv Oct 30, 2025 · Oct 2025

SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification

Yingjia Wang, Ting Qiao, Xing Liu et al. · North China Electric Power University · China Unicom +1 more

Embeds sample-specific backdoor watermarks in training data to prove dataset ownership via black-box model testing

Output Integrity Attack vision
1 citations 1 influentialPDF
defense arXiv Oct 17, 2025 · Oct 2025

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

Ting Qiao, Xing Liu, Wenke Huang et al. · North China Electric Power University · China Unicom +3 more

Certifiably robust training-data watermarking for PLMs using dual-space smoothing to verify dataset ownership under adversarial perturbations

Output Integrity Attack nlp
1 citations PDF Code
defense arXiv Sep 16, 2025 · Sep 2025

Hierarchical Deep Fusion Framework for Multi-dimensional Facial Forgery Detection - The 2024 Global Deepfake Image Detection Challenge

Kohou Wang, Huan Hu, Xiang Liu et al. · China Unicom

Ensemble framework fusing four vision transformers for deepfake face detection, achieving top-20 finish in a 184-team competition

Output Integrity Attack visiongenerative
PDF