Madison Van Doren

benchmark AAAI 2026 AIGOV Workshop and E... Sep 18, 2025 · Sep 2025

Madison Van Doren, Casey Ford · Appen

Human red-team benchmark of 4 MLLMs across 726 adversarial prompts finds Pixtral 12B most vulnerable at ~62% harm rate vs Claude's ~10%

Prompt Injection nlpmultimodal

Papers in Database (1)