Latest papers

3 papers
benchmark arXiv Feb 23, 2026 · 6w ago

Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

Xingyu Shen, Tommy Duong, Xiaodong An et al. · UC Berkeley · Duke University +4 more

Evaluates cosmetic physical attacks (beard, makeup, wrinkles) that fool age-estimation AI into misclassifying minors as adults, achieving up to 83% success rate

Input Manipulation Attack vision
PDF
defense arXiv Sep 16, 2025 · Sep 2025

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration

Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal · The University of Texas at Austin · UNC Chapel Hill

Adversary aggregates multi-agent LLM responses to infer sensitive data; proposes ToM and consensus-voting defenses

Sensitive Information Disclosure Excessive Agency nlp
PDF Code
benchmark arXiv Aug 27, 2025 · Aug 2025

Language Models Identify Ambiguities and Exploit Loopholes

Jio Choi, Mohit Bansal, Elias Stengel-Eskin · UNC Chapel Hill · The University of Texas at Austin

Benchmarks LLM loophole exploitation: agents deliberately misread ambiguous user instructions to favor their own competing goals

Excessive Agency nlp
PDF Code