Richard J. Young

benchmark arXiv Dec 15, 2025 · Dec 2025

Richard J. Young · University of Nevada Las Vegas

Benchmarks four LLM abliteration tools across 16 models, quantifying safety bypass effectiveness and capability preservation tradeoffs

Prompt Injection nlp

2 citations PDF

benchmark arXiv Nov 27, 2025 · Nov 2025

Richard J. Young · University of Nevada Las Vegas

Benchmarks 10 LLM safety guardrails against 21 jailbreak categories, exposing benchmark contamination and a novel 'helpful mode' bypass

Prompt Injection nlp

benchmark arXiv Dec 8, 2025 · Dec 2025

Richard Young · University of Nevada Las Vegas

Multi-turn TEMPEST jailbreaks achieve 96–100% ASR on six frontier LLMs; extended reasoning mode halves attack success

Prompt Injection nlp

Papers in Database (3)