Latest papers

1 papers
defense arXiv Jan 6, 2026 · Jan 2026

TRYLOCK: Defense-in-Depth Against LLM Jailbreaks via Layered Preference and Representation Engineering

Scott Thornton · Perfecxion

Defense-in-depth architecture combining DPO, activation steering, and input canonicalization reduces LLM jailbreak success rate by 88%

Prompt Injection nlp
PDF