Latest papers

3 papers
attack arXiv Mar 16, 2026 · 21d ago

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury et al. · Virginia Tech · Penn State University +2 more

Jailbreak injection during test-time RL amplifies LLM harmful outputs and degrades reasoning performance simultaneously

Prompt Injection Training Data Poisoning nlp
PDF
benchmark arXiv Nov 1, 2025 · Nov 2025

Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?

Berk Atil, Rebecca J. Passonneau, Fred Morstatter · Penn State University · Information Sciences Institute

Benchmarks multilingual jailbreak attacks and defenses across ten languages and six LLMs, finding language-dependent safety gaps

Prompt Injection nlp
1 citations PDF
defense arXiv Sep 29, 2025 · Sep 2025

SecInfer: Preventing Prompt Injection via Inference-time Scaling

Yupei Liu, Yanting Wang, Yuqi Jia et al. · Penn State University · Duke University

Defends LLMs against prompt injection via multi-path sampling and task-guided aggregation at inference time

Prompt Injection nlp
3 citations 1 influentialPDF