Latest papers

3 papers
benchmark arXiv Feb 23, 2026 · 6w ago

Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

Xingyu Shen, Tommy Duong, Xiaodong An et al. · UC Berkeley · Duke University +4 more

Evaluates cosmetic physical attacks (beard, makeup, wrinkles) that fool age-estimation AI into misclassifying minors as adults, achieving up to 83% success rate

Input Manipulation Attack vision
PDF
survey arXiv Feb 6, 2026 · 8w ago

Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski et al. · IARPA · NIST +13 more

Surveys IARPA TrojAI program findings on AI backdoor detection via weight analysis and trigger inversion across multi-year research

Model Poisoning visionnlp
PDF
defense arXiv Dec 14, 2025 · Dec 2025

ceLLMate: Sandboxing Browser AI Agents

Luoxi Meng, Henry Feng, Ilia Shumailov et al. · UC San Diego · AI Sequrity Company

Browser-level sandboxing framework that restricts LLM agent authority and blocks prompt injection via semantic policy enforcement

Prompt Injection Excessive Agency nlp
4 citations PDF