Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

Age estimation systems are increasingly deployed as gatekeepers for age-restricted online content, yet their robustness to cosmetic modifications has not been systematically evaluated. We investigate whether simple, household-accessible cosmetic changes, including beards, grey hair, makeup, and simulated wrinkles, can cause AI age estimators to classify minors as adults. To study this threat at scale without ethical concerns, we simulate these physical attacks on 329 facial images of individuals aged 10 to 21 using a VLM image editor (Gemini 2.5 Flash Image). We then evaluate eight models from our prior benchmark: five specialized architectures (MiVOLO, Custom-Best, Herosan, MiViaLab, DEX) and three vision-language models (Gemini 3 Flash, Gemini 2.5 Flash, GPT-5-Nano). We introduce the Attack Conversion Rate (ACR), defined as the fraction of images predicted as minor at baseline that flip to adult after attack, a population-agnostic metric that does not depend on the ratio of minors to adults in the test set. Our results reveal that a synthetic beard alone achieves 28 to 69 percent ACR across all eight models; combining all four attacks shifts predicted age by +7.7 years on average across all 329 subjects and reaches up to 83 percent ACR; and vision-language models exhibit lower ACR (59 to 71 percent) than specialized models (63 to 83 percent) under the full attack, although the ACR ranges overlap and the difference is not statistically tested. These findings highlight a critical vulnerability in deployed age-verification pipelines and call for adversarial robustness evaluation as a mandatory criterion for model selection.

Key Contributions

Introduces the Attack Conversion Rate (ACR), a population-agnostic metric measuring the fraction of minor-predicted images that flip to adult after cosmetic attack
Systematically evaluates four cosmetic attack types (beard, grey hair, makeup, wrinkles) across eight models (five specialized architectures and three VLMs) using VLM-simulated physical modifications
Reveals that a synthetic beard alone achieves 28–69% ACR and the combined four-attack scenario shifts predicted age by +7.7 years on average with up to 83% ACR

🛡️ Threat Analysis

Input Manipulation Attack

Cosmetic facial modifications (beard, grey hair, makeup, wrinkles) are physical adversarial inputs engineered to cause misclassification at inference time — specifically flipping minor→adult predictions across eight age estimation models including specialized CNNs and VLMs.