Zhexin Zhang

attack arXiv Nov 3, 2025 · Nov 2025

Qin Zhou, Zhexin Zhang, Zhi Li et al. · Institute of Information Engineering · University of Chinese Academy of Sciences +1 more

Indirect prompt injection hidden inside academic papers hijacks LLM-based AI reviewers into awarding perfect scores

Prompt Injection nlp

1 citations PDF

benchmark arXiv Feb 4, 2026 · 8w ago

Zhexin Zhang, Yida Lu, Junfeng Fang et al. · Tsinghua University · National University of Singapore +1 more

First systematic taxonomy of training-time implicit safety risks in RL-trained LLMs, showing risky behaviors in 74.4% of runs

Model Skewing Excessive Agency nlpreinforcement-learning

Papers in Database (2)