Shaahin Angizi

h-index: 31 3,432 citations 176 papers (total)

Papers in Database (2)

defense arXiv Oct 3, 2025 · Oct 2025

Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models

Kartik Pandit, Sourav Ganguly, Arnesh Banerjee et al. · New Jersey Institute of Technology · Heritage Institute of Technology

Proposes CS-RLHF, a penalty-based constrained RLHF framework offering certifiable safety and 5x jailbreak resistance over Lagrangian baselines

Prompt Injection nlpreinforcement-learning
PDF Code
attack arXiv Nov 27, 2025 · Nov 2025

CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

Mohaiminul Al Nahian, Abeer Matar A. Almalky, Gamana Aragonda et al. · SUNY Binghamton · New Jersey Institute of Technology +1 more

Injects Trojan behavior into LLMs via a single KV-cache bit-flip, leaving no traces in weights or inputs

Model Poisoning nlp
PDF