ML Security Papers

Latest papers

3 papers

defense arXiv Feb 16, 2026 · 7w ago

Weight space Detection of Backdoors in LoRA Adapters

David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit et al. · Algoverse AI Research · University of Aberdeen +1 more

Detects backdoored LoRA adapters via SVD spectral statistics on weight matrices, achieving 97% accuracy without model execution

Model Poisoning AI Supply Chain Attacks nlp

PDF

defense arXiv Jan 18, 2026 · 11w ago

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Anirudh Sekar, Mrinal Agarwal, Rachel Sharma et al. · Algoverse AI Research · University of California

Defends LLM pipelines against prompt injection by detecting semantic embedding drift via cosine similarity, achieving 93%+ accuracy zero-shot

Prompt Injection nlp

PDF Code

defense arXiv Dec 12, 2025 · Dec 2025

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Edward Lue Chee Lip, Anthony Channg, Diana Kim et al. · Algoverse AI Research · Colorado State University +1 more

Evaluates safety protocols for multi-agent LLM systems where an untrusted decomposer can inject malicious subtask instructions undetectable by monitors

Excessive Agency Prompt Injection nlp

PDF Code

Latest papers

Weight space Detection of Backdoors in LoRA Adapters

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue