ML Security Papers

Latest papers

6 papers

defense arXiv Mar 27, 2026 · 10d ago

Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models

Zhuan Shi, Alireza Dehghanpour Farashah, Rik de Vries et al. · McGill University · Mila - Québec AI Institute +1 more

Training-free concept erasure for diffusion models that removes unwanted concepts while preserving semantically related neighboring concepts

Output Integrity Attack visiongenerative

PDF

defense arXiv Feb 20, 2026 · 6w ago

On the Adversarial Robustness of Discrete Image Tokenizers

Rishika Bhagwatkar, Irina Rish, Nicolas Flammarion et al. · Mila - Québec AI Institute · EPFL +1 more

Attacks discrete image tokenizers with adversarial perturbations and defends via unsupervised adversarial training across multimodal tasks

Input Manipulation Attack visionmultimodal

PDF Code

benchmark EMNLP Oct 15, 2025 · Oct 2025

How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study

Matthieu Dubois, François Yvon, Pablo Piantanida · Sorbonne Université · CNRS +2 more

Benchmarks AI text detectors across 37 decoding configs, showing AUROC collapses from 0.99 to 0.01 with minor sampling changes

Output Integrity Attack nlp

2 citations PDF Code

defense arXiv Oct 6, 2025 · Oct 2025

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

Rishika Bhagwatkar, Kevin Kasa, Abhay Puri et al. · ServiceNow Research · Mila - Québec AI Institute +3 more

Modular agent-tool firewall achieves perfect indirect prompt injection defense on four benchmarks, while exposing those benchmarks as too weak

Prompt Injection nlp

4 citations PDF

defense arXiv Oct 3, 2025 · Oct 2025

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar et al. · LIRIS - CNRS · Esker +3 more

Defends LLM web agents against indirect prompt injection by pruning accessibility tree observations with a lightweight LLM retriever

Prompt Injection nlp

2 citations PDF

attack arXiv Oct 3, 2025 · Oct 2025

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru et al. · ServiceNow Research · Mila - Québec AI Institute +2 more

Backdoors injected via AI supply chain poisoning cause agents to leak confidential data with 80%+ success at 2% poison rate

Model Poisoning AI Supply Chain Attacks nlp

2 citations PDF

Latest papers

Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models

On the Adversarial Robustness of Discrete Image Tokenizers

How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue