Ranjan Satapathy

Papers in Database (2)

defense arXiv Aug 16, 2025 · Aug 2025

Wei Jie Yeo, Ranjan Satapathy, Erik Cambria · Nanyang Technological University · A*STAR

Fine-tunes LLMs to infer instruction intent before responding, reducing all jailbreak attack categories below 50% success rate

Prompt Injection nlp

attack arXiv Sep 7, 2025 · Sep 2025

Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah et al. · Singapore University of Technology and Design · Nanyang Technological University +2 more

Ablates SAE latent features mediating refusal in LLMs to produce mechanistically-grounded jailbreaks via a three-stage pipeline

Prompt Injection nlp