attack 2026

With a Little Help From My Friends: Collective Manipulation in Risk-Controlling Recommender Systems

Giovanni De Toni 1, Cristian Consonni 2, Erasmo Purificato 2, Emilia Gomez 2, Bruno Lepri 1

0 citations

α

Published on arXiv

2603.28476

Model Skewing

OWASP ML Top 10 — ML08

Key Finding

A coordinated group of just 1% of users (40 users) reporting 1% of encountered items can induce up to 20% nDCG degradation for non-adversarial users

Collective feedback manipulation attack

Novel technique introduced


Recommendation systems have become central gatekeepers of online information, shaping user behaviour across a wide range of activities. In response, users increasingly organize and coordinate to steer algorithmic outcomes toward diverse goals, such as promoting relevant content or limiting harmful material, relying on platform affordances -- such as likes, reviews, or ratings. While these mechanisms can serve beneficial purposes, they can also be leveraged for adversarial manipulation, particularly in systems where such feedback directly informs safety guarantees. In this paper, we study this vulnerability in recently proposed risk-controlling recommender systems, which use binary user feedback (e.g., "Not Interested") to provably limit exposure to unwanted content via conformal risk control. We empirically demonstrate that their reliance on aggregate feedback signals makes them inherently susceptible to coordinated adversarial user behaviour. Using data from a large-scale online video-sharing platform, we show that a small coordinated group (comprising only 1% of the user population) can induce up to a 20% degradation in nDCG for non-adversarial users by exploiting the affordances provided by risk-controlling recommender systems. We evaluate simple, realistic attack strategies that require little to no knowledge of the underlying recommendation algorithm and find that, while coordinated users can significantly harm overall recommendation quality, they cannot selectively suppress specific content groups through reporting alone. Finally, we propose a mitigation strategy that shifts guarantees from the group level to the user level, showing empirically how it can reduce the impact of adversarial coordinated behaviour while ensuring personalized safety for individuals.


Key Contributions

  • Demonstrates that 1% coordinated adversarial users can degrade nDCG by 20% in risk-controlling recommender systems by exploiting 'Not Interested' feedback
  • Shows that simple, low-knowledge attack strategies can manipulate conformal risk control thresholds to harm recommendation quality while strengthening formal safety guarantees
  • Proposes user-level (vs group-level) guarantee mitigation that reduces coordinated attack impact while preserving personalized safety

🛡️ Threat Analysis

Model Skewing

Paper studies temporal, coordinated manipulation of user feedback signals over time to gradually degrade recommender system performance through exploitation of the feedback loop between user reports and risk-control calibration. The attack exploits the system's reliance on aggregate feedback to adjust filtering thresholds, causing drift in recommendation quality.


Details

Domains
nlp
Model Types
traditional_ml
Threat Tags
black_boxinference_timeuntargeted
Datasets
large-scale online video-sharing platform dataset
Applications
content recommendationvideo recommendation