Xinchao Wang

Papers in Database (2)

attack arXiv Mar 9, 2026 · 28d ago

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Guangnian Wan, Xinyin Ma, Gongfan Fang et al. · National University of Singapore

Fine-tunes LLMs via API to covertly embed harmful content in steganographic cover responses, bypassing safety classifiers 100% of the time

Transfer Learning Attack Model Poisoning nlp
PDF Code
attack arXiv Apr 1, 2026 · 5d ago

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

Ruhao Liu, Weiqi Huang, Qi Li et al. · National University of Singapore

Agentic framework that automates membership inference attacks through self-exploration and strategy evolution, outperforming handcrafted baselines

Membership Inference Attack
PDF Code