attack 2025

A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model

Sampriti Soor 1,2, Alik Pramanick 1, Jothiprakash K 1, Arijit Sur 1

0 citations · 26 references · arXiv

α

Published on arXiv

2511.01317

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

The proposed CLIP-guided generative attack achieves comparable or superior adversarial success to existing techniques while preserving greater visual fidelity across diverse black-box multilabel classifiers.


The rapid growth of deep learning has brought about powerful models that can handle various tasks, like identifying images and understanding language. However, adversarial attacks, an unnoticed alteration, can deceive models, leading to inaccurate predictions. In this paper, a generative adversarial attack method is proposed that uses the CLIP model to create highly effective and visually imperceptible adversarial perturbations. The CLIP model's ability to align text and image representation helps incorporate natural language semantics with a guided loss to generate effective adversarial examples that look identical to the original inputs. This integration allows extensive scene manipulation, creating perturbations in multi-object environments specifically designed to deceive multilabel classifiers. Our approach integrates the concentrated perturbation strategy from Saliency-based Auto-Encoder (SSAE) with the dissimilar text embeddings similar to Generative Adversarial Multi-Object Scene Attacks (GAMA), resulting in perturbations that both deceive classification models and maintain high structural similarity to the original images. The model was tested on various tasks across diverse black-box victim models. The experimental results show that our method performs competitively, achieving comparable or superior results to existing techniques, while preserving greater visual fidelity.


Key Contributions

  • Generative adversarial attack framework guided by CLIP text-image alignment loss to produce semantically meaningful, visually imperceptible adversarial perturbations
  • Integration of saliency-based concentrated perturbation (from SSAE) with dissimilar text embeddings (from GAMA) for multi-object scene attacks on multilabel classifiers
  • Demonstrates competitive or superior attack success rates over existing methods while maintaining higher structural similarity to original images across diverse black-box victim models

🛡️ Threat Analysis

Input Manipulation Attack

The paper's primary contribution is a method to craft adversarial perturbations at inference time that cause multilabel classifiers to produce incorrect predictions. It combines a GAN generator with CLIP-guided semantic loss to produce visually imperceptible, transferable adversarial examples — a direct input manipulation attack.


Details

Domains
visionmultimodal
Model Types
gantransformer
Threat Tags
black_boxinference_timeuntargeteddigital
Applications
image classificationmultilabel classificationmulti-object scene recognition