Guangnian Wan

Papers in Database (1)

attack arXiv Mar 9, 2026 · 4w ago

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Guangnian Wan, Xinyin Ma, Gongfan Fang et al. · National University of Singapore

Fine-tunes LLMs via API to covertly embed harmful content in steganographic cover responses, bypassing safety classifiers 100% of the time

Transfer Learning Attack Model Poisoning nlp
PDF Code