attack 2026

Controlling Output Rankings in Generative Engines for LLM-based Search

Haibo Jin 1, Ruoxi Chen 2, Peiyan Zhang 3, Yifeng Luo 1, Huimin Zeng 1, Man Luo 4, Haohan Wang 1

0 citations · 24 references · arXiv (Cornell University)

α

Published on arXiv

2602.03608

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 91.4% Promotion Success Rate @Top-5 and 80.3% @Top-1 across 15 product categories on four major LLMs, outperforming existing ranking manipulation methods.

CORE

Novel technique introduced


The way customers search for and choose products is changing with the rise of large language models (LLMs). LLM-based search, or generative engines, provides direct product recommendations to users, rather than traditional online search results that require users to explore options themselves. However, these recommendations are strongly influenced by the initial retrieval order of LLMs, which disadvantages small businesses and independent creators by limiting their visibility. In this work, we propose CORE, an optimization method that \textbf{C}ontrols \textbf{O}utput \textbf{R}ankings in g\textbf{E}nerative Engines for LLM-based search. Since the LLM's interactions with the search engine are black-box, CORE targets the content returned by search engines as the primary means of influencing output rankings. Specifically, CORE optimizes retrieved content by appending strategically designed optimization content to steer the ranking of outputs. We introduce three types of optimization content: string-based, reasoning-based, and review-based, demonstrating their effectiveness in shaping output rankings. To evaluate CORE in realistic settings, we introduce ProductBench, a large-scale benchmark with 15 product categories and 200 products per category, where each product is associated with its top-10 recommendations collected from Amazon's search interface. Extensive experiments on four LLMs with search capabilities (GPT-4o, Gemini-2.5, Claude-4, and Grok-3) demonstrate that CORE achieves an average Promotion Success Rate of \textbf{91.4\% @Top-5}, \textbf{86.6\% @Top-3}, and \textbf{80.3\% @Top-1}, across 15 product categories, outperforming existing ranking manipulation methods while preserving the fluency of optimized content.


Key Contributions

  • CORE: a black-box optimization method that appends three types of crafted content (string-based, reasoning-based, review-based) to product pages to steer LLM search ranking outputs
  • ProductBench: a large-scale benchmark with 15 product categories and 200 products each, with Amazon top-10 recommendations as ground truth
  • Demonstrates 91.4% Promotion Success Rate @Top-5 and 80.3% @Top-1 across GPT-4o, Gemini-2.5, Claude-4, and Grok-3, outperforming prior ranking manipulation baselines

🛡️ Threat Analysis

Input Manipulation Attack

CORE adversarially crafts content appended to product/web pages that, when retrieved by LLM-based search, steers the model's output rankings — matching the explicitly listed dual-tag case of 'adversarial SEO poisoning for LLM search engines, adversarial document injection for RAG' where inputs are strategically crafted to manipulate LLM-integrated system outputs.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
ProductBenchAmazon search interface
Applications
llm-based search enginese-commerce product recommendations