Latest papers

2 papers
defense arXiv Nov 20, 2025 · Nov 2025

Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis

Shahin Zanbaghi, Ryan Rostampour, Farhan Abid et al. · University of Windsor

Detects backdoored LLM sleeper agents using semantic drift analysis and canary queries, achieving 92.5% accuracy with zero false positives

Model Poisoning nlp
PDF
benchmark TrustCom Nov 9, 2025 · Nov 2025

Comparing Reconstruction Attacks on Pretrained Versus Full Fine-tuned Large Language Model Embeddings on Homo Sapiens Splice Sites Genomic Data

Reem Al-Saidi, Erman Ayday, Ziad Kobti · University of Windsor · Case Western Reserve University

Compares genomic DNA reconstruction vulnerability across pretrained and fine-tuned LLM embeddings, finding fine-tuning reduces attack success by up to 19.8%

Model Inversion Attack Sensitive Information Disclosure nlp
PDF