Zachary Coalson

defense arXiv Feb 19, 2026 · 6w ago

Fail-Closed Alignment for Large Language Models

Zachary Coalson, Beth Sohler, Aiden Gabriel et al. · Oregon State University

Defends LLMs against jailbreaks by training multiple independent refusal pathways that attackers cannot simultaneously suppress

Prompt Injection nlp

PDF

attack arXiv Feb 19, 2026 · 6w ago

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Zachary Coalson, Bo Fang, Sanghyun Hong · Oregon State University · University of Texas at Arlington

Discovers turn amplification as an LLM resource-exhaustion attack, using mechanistic activation analysis to enable persistent fine-tuning and parameter-corruption attack vectors

Model Poisoning Model Denial of Service nlp

PDF

attack arXiv Feb 19, 2026 · 6w ago

Discovering Universal Activation Directions for PII Leakage in Language Models

Leo Marchyok, Zachary Coalson, Sungho Keum et al. · Oregon State University · Korea Advanced Institute of Science & Technology

Discovers universal activation directions in LLM residual streams that reliably amplify PII leakage beyond existing prompt-based extraction attacks

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

Papers in Database (3)

Fail-Closed Alignment for Large Language Models

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Discovering Universal Activation Directions for PII Leakage in Language Models