Latest papers

4 papers
attack arXiv Mar 15, 2026 · 22d ago

Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

Ruoxi Cheng, Yizhong Ding, Hongyi Zhang et al. · Beijing Electronic Science and Technology Institute · Alibaba Group +2 more

Text-only membership inference attack on CLIP/CLAP models that detects PII memorization without exposing biometric data

Membership Inference Attack multimodalvisionaudionlp
PDF
attack arXiv Dec 21, 2025 · Dec 2025

MEEA: Mere Exposure Effect-Driven Confrontational Optimization for LLM Jailbreaking

Jianyi Zhang, Shizhao Liu, Ziyin Zhou et al. · Beijing Electronic Science and Technology Institute

Multi-turn black-box jailbreak using repeated low-toxicity prompts to progressively erode LLM safety thresholds, outperforming 7 baselines by 20%+ ASR

Prompt Injection nlp
PDF Code
survey arXiv Oct 9, 2025 · Oct 2025

Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs

Man Hu, Xinyi Wu, Zuofeng Suo et al. · Beijing Electronic Science and Technology Institute · Nanyang Technological University +1 more

First survey on backdoor attacks targeting LLM reasoning processes, proposing a three-type taxonomy of associative, passive, and active backdoors

Model Poisoning nlp
PDF
defense arXiv Aug 3, 2025 · Aug 2025

DUP: Detection-guided Unlearning for Backdoor Purification in Language Models

Man Hu, Yahui Ding, Yatao Yang et al. · Beijing Electronic Science and Technology Institute · Nanyang Technological University

Defends language models against backdoor attacks via fine-grained feature detection and LoRA-based unlearning without full retraining

Model Poisoning nlp
PDF Code