Kashyap Thimmaraju

h-index: 9 267 citations 32 papers (total)

Papers in Database (2)

benchmark arXiv Feb 10, 2026 · 7w ago

Stop Testing Attacks, Start Diagnosing Defenses: The Four-Checkpoint Framework Reveals Where LLM Safety Breaks

Hayfa Dhahbi, Kashyap Thimmaraju · Technische Universität Berlin

Proposes Four-Checkpoint Framework and WASR metric to diagnose which LLM safety layers break under 13 prompt-level jailbreak techniques

Prompt Injection nlp
PDF
benchmark arXiv Dec 30, 2025 · Dec 2025

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models

Giuseppe Canale, Kashyap Thimmaraju · CPF3.org · Flowguard Institute

Proposes a benchmark framework exposing LLMs to human-style social engineering attacks via authority, urgency, and social proof manipulation

Prompt Injection Excessive Agency nlp
PDF