Tianhang Zheng

Papers in Database (4)

defense arXiv Apr 27, 2026 · 24d ago

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Tianhang Zheng et al. · AnHui University · Dalian University of Technology +1 more

Interprets catastrophic overfitting in fast adversarial training as trigger-based backdoor behavior and proposes backdoor-inspired mitigation strategies

Input Manipulation Attack Model Poisoning vision
PDF
defense arXiv Apr 9, 2026 · 6w ago

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

Weiwei Qi, Zefeng Wu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Identifies safety-critical LLM parameters via gradient analysis, enabling targeted safety tuning and preservation during fine-tuning

Prompt Injection nlp
PDF Code
attack arXiv Aug 18, 2025 · Aug 2025

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Weiwei Qi, Shuo Shao, Wei Gu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Markov-chain jailbreak framework combines diverse disguise strategies adaptively, achieving 90%+ ASR on GPT-4o in under 15 queries

Prompt Injection nlp
PDF
defense arXiv Apr 27, 2026 · 24d ago

Mitigating Error Amplification in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Bo Wang et al. · AnHui University · Dalian University of Technology +2 more

Dynamic guidance strategy that adjusts perturbation budgets and supervision signals during adversarial training to prevent catastrophic overfitting

Input Manipulation Attack vision
PDF