Mohan Kankanhalli

Papers in Database (3)

attack arXiv Aug 18, 2025 · Aug 2025

Involuntary Jailbreak: On Self-Prompting Attacks

Yangyang Guo, Yangyan Li, Mohan Kankanhalli · National University of Singapore · Alibaba Group

Single universal self-prompting attack bypasses entire guardrail structures of GPT-4.1, Claude, Gemini, and Grok

Prompt Injection nlp
PDF Code
defense arXiv Sep 9, 2025 · Sep 2025

Nearest Neighbor Projection Removal Adversarial Training

Himanshu Singh, A. V. Subramanyam, Shivank Rajput et al. · IIIT Delhi · National University of Singapore

Adversarial training defense that projects out inter-class feature dependencies to enforce separability and reduce Lipschitz constant

Input Manipulation Attack vision
PDF
defense arXiv Mar 6, 2026 · 4w ago

Word-Anchored Temporal Forgery Localization

Tianyi Wang, Xi Shao, Harry Cheng et al. · National University of Singapore · Nanjing University of Posts and Telecommunications +1 more

Detects audio-visual deepfake segments via word-token binary classification, outperforming regression-based TFL baselines

Output Integrity Attack audiovisionmultimodal
PDF