ML Security Papers

Latest papers

2 papers

benchmark arXiv Dec 22, 2025 · Dec 2025

Manas Khatore, Sumana Sridharan, Kevork Sulahian et al. · Algoverse · p-1.ai +1 more

Tests whether verbosity, hedging, and conflicting-answer injection can game LLM-based answer-matching evaluation systems

Prompt Injection nlp

defense arXiv Nov 23, 2025 · Nov 2025

Yanxi Li, Ruocheng Shan · George Washington University

Defends LLMs against class-directive prompt injection by disguising output labels with alias terms in few-shot prompts

Prompt Injection nlp