Fei Cheng

attack arXiv Jan 23, 2026 · 10w ago

Jivnesh Sandhan, Fei Cheng, Tushar Sandhan et al. · Kyoto University · Indian Institute of Technology Kanpur

Black-box attack gradually hijacks LLM personas via adversarial conversational history, bypassing guardrails across 8 LLMs

Prompt Injection nlp

defense arXiv Jan 9, 2026 · 12w ago

Jivnesh Sandhan, Harshit Jaiswal, Fei Cheng et al. · Kyoto University · IIT Kanpur

Exposes brittleness of LLM text detectors under domain shift; proposes supervised contrastive learning framework for robust AI-text detection

Output Integrity Attack nlp

Papers in Database (2)