Evangelos E. Papalexakis

h-index: 3 669 citations 28 papers (total)

Papers in Database (2)

defense arXiv Oct 8, 2025 · Oct 2025

Do Internal Layers of LLMs Reveal Patterns for Jailbreak Detection?

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis · University of California

Detects jailbreaks by analyzing hidden-layer representations of GPT-J and Mamba2 via tensor decomposition

Prompt Injection nlp
1 citations PDF
defense arXiv Feb 12, 2026 · 7w ago

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis · University of California

Detects and disrupts LLM jailbreaks at inference time using tensor decomposition of internal layer activations

Prompt Injection nlp
PDF