attack 2025

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

Qiming Guo 1, Jinwen Tang 2, Xingran Huang 3

0 citations

α

Published on arXiv

2508.17674

Model Poisoning

OWASP ML Top 10 — ML10

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Successfully manipulated Gemini 2.5 outputs to embed covert advertisements despite predefined safety prompts, demonstrating current LLM service providers are inadequately defended against AEA

Advertisement Embedding Attack (AEA)

Novel technique introduced


We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through two low-cost vectors: (1) hijacking third-party service-distribution platforms to prepend adversarial prompts, and (2) publishing back-doored open-source checkpoints fine-tuned with attacker data. Unlike conventional attacks that degrade accuracy, AEA subvert information integrity, causing models to return covert ads, propaganda, or hate speech while appearing normal. We detail the attack pipeline, map five stakeholder victim groups, and present an initial prompt-based self-inspection defense that mitigates these injections without additional model retraining. Our findings reveal an urgent, under-addressed gap in LLM security and call for coordinated detection, auditing, and policy responses from the AI-safety community.


Key Contributions

  • Definition and taxonomy of Advertisement Embedding Attacks (AEA) as a new LLM threat class with two distinct vectors: service platform hijacking for prompt injection and backdoored checkpoint distribution via model hubs
  • Mapping of five stakeholder victim groups and their potential losses from covert content injection
  • Prompt-based self-inspection defense that mitigates AEA injections without additional model retraining

🛡️ Threat Analysis

AI Supply Chain Attacks

Backdoored models are explicitly distributed via open-source model distribution platforms (supply chain compromise), enabling a pre-deployment attack; paper also describes hijacking service-distribution platforms as a deployment-pipeline attack vector.

Model Poisoning

Second attack vector involves publishing fine-tuned backdoored open-source checkpoints that embed hidden malicious behavior (ad injection, propaganda, hate speech) activated at inference time while appearing normal.


Details

Domains
nlp
Model Types
llm
Threat Tags
training_timeinference_timetargeted
Applications
llm servicesai agentsopen-source model platforms