Latest papers

2 papers
benchmark arXiv Dec 5, 2025 · Dec 2025

Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

Ana-Maria Cretu, Klim Kireev, Amro Abdalla et al. · EPFL · MPI-SP +2 more

Evaluates T2I concept filtering defenses against CSAM, showing prompting and fine-tuning attacks bypass even near-perfect child image filtering

Data Poisoning Attack Transfer Learning Attack visiongenerative
PDF
defense arXiv Oct 16, 2025 · Oct 2025

Backdoor Unlearning by Linear Task Decomposition

Amel Abdelraheem, Alessandro Favero, Gerome Bovet et al. · EPFL · armasuisse

Removes backdoors from CLIP foundation models via weight-space task negation, retaining 96% clean accuracy with near-perfect unlearning

Model Poisoning visionmultimodal
PDF