attack 2025

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

Viktoriia Zinkovich , Anton Antonov , Andrei Spiridonov , Denis Shepelev , Andrey Moskalenko , Daria Pugacheva , Elena Tutubalina , Andrey Kuznetsov , Vlad Shakhuro

0 citations · 62 references · arXiv

α

Published on arXiv

2510.24446

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

SPARTA outperforms prior adversarial paraphrasing methods by up to 2× in attack success rate on ReasonSeg and LLMSeg-40k while satisfying strict semantic and grammatical constraints.

SPARTA

Novel technique introduced


Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation, where models generate segmentation masks based on textual queries. While prior work has primarily focused on perturbing image inputs, semantically equivalent textual paraphrases-crucial in real-world applications where users express the same intent in varied ways-remain underexplored. To address this gap, we introduce a novel adversarial paraphrasing task: generating grammatically correct paraphrases that preserve the original query meaning while degrading segmentation performance. To evaluate the quality of adversarial paraphrases, we develop a comprehensive automatic evaluation protocol validated with human studies. Furthermore, we introduce SPARTA-a black-box, sentence-level optimization method that operates in the low-dimensional semantic latent space of a text autoencoder, guided by reinforcement learning. SPARTA achieves significantly higher success rates, outperforming prior methods by up to 2x on both the ReasonSeg and LLMSeg-40k datasets. We use SPARTA and competitive baselines to assess the robustness of advanced reasoning segmentation models. We reveal that they remain vulnerable to adversarial paraphrasing-even under strict semantic and grammatical constraints. All code and data will be released publicly upon acceptance.


Key Contributions

  • SPARTA: a black-box adversarial paraphrasing method operating in the low-dimensional semantic latent space of a text autoencoder, guided by reinforcement learning
  • Comprehensive automatic evaluation protocol for adversarial paraphrase quality, validated with human studies
  • Robustness benchmark showing state-of-the-art reasoning segmentation models remain vulnerable to adversarial paraphrasing, with SPARTA achieving up to 2× higher success rates than prior methods

🛡️ Threat Analysis


Details

Domains
visionnlpmultimodal
Model Types
vlmllmtransformer
Threat Tags
black_boxinference_timetargeted
Datasets
ReasonSegLLMSeg-40k
Applications
reasoning segmentationvision-language querying