defense 2026

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi 1, Haoyu Wang 1, Zaihui Yang 2, Yuxing Wang 2, Yongzhe Chang 2

0 citations

α

Published on arXiv

2603.01784

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Evolutionary Attacker substantially increases red-teaming jailbreak attack success rate (ASR) while the Adaptive Defender improves robustness and generalization across benchmarks with higher data efficiency and without excessive benign refusal

CEMMA

Novel technique introduced


Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on static adversarial settings, which fundamentally limit robustness, particularly in multimodal settings with a larger attack surface. In this work, we move beyond static adversarial supervision and introduce co-evolutionary alignment with evolving attacks, instantiated by CEMMA (Co-Evolutionary Multi-Modal Alignment), an automated and adaptive framework for multimodal safety alignment. We introduce an Evolutionary Attacker that decomposes adversarial prompts into method templates and harmful intents. By employing genetic operators, including mutation, crossover, and differential evolution, it enables simple seed attacks to inherit the structural efficacy of sophisticated jailbreaks. The Adaptive Defender is iteratively updated on the synthesized hard negatives, forming a closed-loop process that adapts alignment to evolving attacks. Experiments show that the Evolutionary Attacker substantially increases red-teaming jailbreak attack success rate (ASR), while the Adaptive Defender improves robustness and generalization across benchmarks with higher data efficiency, without inducing excessive benign refusal, and remains compatible with inference-time defenses such as AdaShield.


Key Contributions

  • CEMMA co-evolutionary framework that treats MLLM safety alignment as a closed-loop process between an evolving attacker and an adaptive defender
  • Evolutionary Attacker using genetic operators (Mutate, Crossover, DiffEvo) to decompose jailbreaks into method templates and harmful intents, enabling structured strategy transfer across attack families
  • Adaptive Defender iteratively fine-tuned on a growing archive of successful jailbreaks mixed with benign data, improving robustness without over-refusal and remaining compatible with inference-time defenses like AdaShield

🛡️ Threat Analysis


Details

Domains
nlpmultimodal
Model Types
vlmllmmultimodal
Threat Tags
black_boxinference_timetraining_time
Applications
multimodal large language modelsvlm safety alignmentred-teaming