Ryan Rostampour

h-index: 0 0 citations 1 papers (total)

Papers in Database (1)

defense arXiv Nov 20, 2025 · Nov 2025

Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis

Shahin Zanbaghi, Ryan Rostampour, Farhan Abid et al. · University of Windsor

Detects backdoored LLM sleeper agents using semantic drift analysis and canary queries, achieving 92.5% accuracy with zero false positives

Model Poisoning nlp
PDF