benchmark arXiv Sep 8, 2025 · Sep 2025
William Xu, Yiwei Lu, Yihan Wang et al. · University of Waterloo · University of Ottawa +3 more
Introduces three metrics—ergodic prediction accuracy, poison distance, and budget—to predict which test instances are most vulnerable to targeted data poisoning
Data Poisoning Attack vision
Targeted data poisoning attacks pose an increasingly serious threat due to their ease of deployment and high success rates. These attacks aim to manipulate the prediction for a single test sample in classification models. Unlike indiscriminate attacks that aim to decrease overall test performance, targeted attacks present a unique threat to individual test instances. This threat model raises a fundamental question: what factors make certain test samples more susceptible to successful poisoning than others? We investigate how attack difficulty varies across different test instances and identify key characteristics that influence vulnerability. This paper introduces three predictive criteria for targeted data poisoning difficulty: ergodic prediction accuracy (analyzed through clean training dynamics), poison distance, and poison budget. Our experimental results demonstrate that these metrics effectively predict the varying difficulty of real-world targeted poisoning attacks across diverse scenarios, offering practitioners valuable insights for vulnerability assessment and understanding data poisoning attacks.
cnn transformer University of Waterloo · University of Ottawa · Google +2 more
benchmark arXiv Aug 16, 2025 · Aug 2025
Jimmy Z. Di, Yiwei Lu, Yaoliang Yu et al. · University of Waterloo · Vector Institute +2 more
Proposes FB-Mem segmentation metric to quantify partial training data memorization in diffusion models, showing current mitigations fail for foreground regions
Model Inversion Attack visiongenerative
Diffusion models (DMs) memorize training images and can reproduce near-duplicates during generation. Current detection methods identify verbatim memorization but fail to capture two critical aspects: quantifying partial memorization occurring in small image regions, and memorization patterns beyond specific prompt-image pairs. To address these limitations, we propose Foreground Background Memorization (FB-Mem), a novel segmentation-based metric that classifies and quantifies memorized regions within generated images. Our method reveals that memorization is more pervasive than previously understood: (1) individual generations from single prompts may be linked to clusters of similar training images, revealing complex memorization patterns that extend beyond one-to-one correspondences; and (2) existing model-level mitigation methods, such as neuron deactivation and pruning, fail to eliminate local memorization, which persists particularly in foreground regions. Our work establishes an effective framework for measuring memorization in diffusion models, demonstrates the inadequacy of current mitigation approaches, and proposes a stronger mitigation method using a clustering approach.
diffusion University of Waterloo · Vector Institute · University of Ottawa +1 more