attack 2025

Efficiently Attacking Memorization Scores

Tue Do , Varun Chandrasekaran , Daniel Alabi

0 citations · 54 references · arXiv

α

Published on arXiv

2509.20463

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Pseudoinverse-based adversarial inputs can systematically inflate memorization scores on state-of-the-art influence estimators using only black-box access to model outputs

Pseudoinverse Memorization Attack

Novel technique introduced


Influence estimation tools -- such as memorization scores -- are widely used to understand model behavior, attribute training data, and inform dataset curation. However, recent applications in data valuation and responsible machine learning raise the question: can these scores themselves be adversarially manipulated? In this work, we present a systematic study of the feasibility of attacking memorization-based influence estimators. We characterize attacks for producing highly memorized samples as highly sensitive queries in the regime where a trained algorithm is accurate. Our attack (calculating the pseudoinverse of the input) is practical, requiring only black-box access to model outputs and incur modest computational overhead. We empirically validate our attack across a wide suite of image classification tasks, showing that even state-of-the-art proxies are vulnerable to targeted score manipulations. In addition, we provide a theoretical analysis of the stability of memorization scores under adversarial perturbations, revealing conditions under which influence estimates are inherently fragile. Our findings highlight critical vulnerabilities in influence-based attribution and suggest the need for robust defenses. All code can be found at https://github.com/tuedo2/MemAttack


Key Contributions

  • First systematic adversarial attack on memorization-based influence estimators, showing they can be targeted to produce artificially inflated scores
  • Practical black-box attack using pseudoinverse computation that incurs only modest overhead and requires no training access
  • Theoretical analysis establishing conditions under which memorization scores are inherently fragile under adversarial perturbations

🛡️ Threat Analysis

Input Manipulation Attack

The core contribution is crafting adversarial inputs (pseudoinverse of the input) that manipulate memorization-based influence estimators at inference time with black-box access, causing targeted score inflation — this is an adversarial input crafting attack on a model-based scoring system.


Details

Domains
vision
Model Types
cnn
Threat Tags
black_boxinference_timetargeted
Applications
data attributiondata valuationdataset curationresponsible ml analysis