Casey Ford

benchmark arXiv Feb 4, 2026 · 8w ago

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Casey Ford, Madison Van Doren, Emily Dix · Appen

Longitudinal red-team benchmark reveals unstable alignment across MLLM generations, with GPT and Claude showing increased attack success rates over time

Prompt Injection nlpmultimodal

PDF

Papers in Database (1)

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases