Latest papers

4 papers
defense arXiv Feb 18, 2026 · 6w ago

Protecting the Undeleted in Machine Unlearning

Aloni Cohen, Refael Kohen, Kobbi Nissim et al. · University of Chicago · Tel Aviv University +1 more

Demonstrates that perfect-retraining unlearning leaks undeleted users' data; proposes new security definition to prevent reconstruction attacks via deletion requests

Model Inversion Attack
PDF
attack arXiv Jan 18, 2026 · 11w ago

Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers

Yixuan Du, Chenxiao Yu, Haoyan Xu et al. · Georgetown University · University of Southern California +2 more

Jointly optimizes adversarial image perturbations and gradient-based text suffixes to manipulate VLM-based product search rankings

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code
benchmark arXiv Dec 5, 2025 · Dec 2025

Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

Ana-Maria Cretu, Klim Kireev, Amro Abdalla et al. · EPFL · MPI-SP +2 more

Evaluates T2I concept filtering defenses against CSAM, showing prompting and fine-tuning attacks bypass even near-perfect child image filtering

Data Poisoning Attack Transfer Learning Attack visiongenerative
PDF
attack arXiv Oct 8, 2025 · Oct 2025

Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization

Tiancheng Xing, Jerry Li, Yixuan Du et al. · National University of Singapore · University of Southern California +2 more

Gradient-optimized adversarial text attack manipulates LLM rerankers to promote target documents while appearing natural

Input Manipulation Attack Prompt Injection nlp
3 citations 1 influentialPDF Code