Xiangfang Li

h-index: 2 37 citations 6 papers (total)

Papers in Database (2)

attack arXiv Oct 1, 2025 · Oct 2025

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach

Xiangfang Li, Yu Wang, Bo Li · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Backdoor-based fine-tuning attack that jailbreaks GPT-4o and GPT-4.1 at 97%+ ASR by evading data filters, defensive fine-tuning, and safety audits

Model Poisoning Prompt Injection nlp
2 citations PDF Code
defense arXiv Nov 16, 2025 · Nov 2025

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

Haotian Jin, Yang Li, Haihui Fan et al. · Chinese Academy of Sciences · State Key Laboratory of Cyberspace Security Defense +1 more

Defends LLMs against backdoor attacks by detecting abnormal inter-head attention similarity and realigning contaminated attention heads via fine-tuning

Model Poisoning nlp
1 citations PDF