Xinchao Wang

attack arXiv Mar 9, 2026 · 28d ago

Guangnian Wan, Xinyin Ma, Gongfan Fang et al. · National University of Singapore

Fine-tunes LLMs via API to covertly embed harmful content in steganographic cover responses, bypassing safety classifiers 100% of the time

Transfer Learning Attack Model Poisoning nlp

attack arXiv Apr 1, 2026 · 5d ago

Ruhao Liu, Weiqi Huang, Qi Li et al. · National University of Singapore

Agentic framework that automates membership inference attacks through self-exploration and strategy evolution, outperforming handcrafted baselines

Membership Inference Attack

Papers in Database (2)