Gelei Deng

attack arXiv Oct 9, 2025 · Oct 2025

Haoran Ou, Kangjie Chen, Xingshuo Han et al. · Nanyang Technological University · Nanjing University of Aeronautics and Astronautics +2 more

Red-teams web-augmented LLMs with benign-looking search queries that bypass safety filters and force harmful content citations

Prompt Injection nlp

1 citations PDF

attack arXiv Jan 31, 2026 · 9w ago

Haoran Ou, Kangjie Chen, Gelei Deng et al. · Nanyang Technological University · A*STAR

Agent-based adversarial claim attacks on search-augmented LLM fact-checkers disrupt retrieval and reasoning, dropping accuracy from 78.7% to 53.7%

Prompt Injection nlp

defense arXiv Jan 31, 2026 · 9w ago

Jingnan Zheng, Jingjun Xu, Yanzhen Luo et al. · National University of Singapore · Southern University of Science and Technology +2 more

Defends Large Reasoning Models from jailbreaks by steering hidden-state activations to enforce safety compliance over sycophancy

Prompt Injection nlp

Papers in Database (3)