With a Little Help From My Friends: Collective Manipulation in Risk-Controlling Recommender Systems

Recommendation systems have become central gatekeepers of online information, shaping user behaviour across a wide range of activities. In response, users increasingly organize and coordinate to steer algorithmic outcomes toward diverse goals, such as promoting relevant content or limiting harmful material, relying on platform affordances -- such as likes, reviews, or ratings. While these mechanisms can serve beneficial purposes, they can also be leveraged for adversarial manipulation, particularly in systems where such feedback directly informs safety guarantees. In this paper, we study this vulnerability in recently proposed risk-controlling recommender systems, which use binary user feedback (e.g., "Not Interested") to provably limit exposure to unwanted content via conformal risk control. We empirically demonstrate that their reliance on aggregate feedback signals makes them inherently susceptible to coordinated adversarial user behaviour. Using data from a large-scale online video-sharing platform, we show that a small coordinated group (comprising only 1% of the user population) can induce up to a 20% degradation in nDCG for non-adversarial users by exploiting the affordances provided by risk-controlling recommender systems. We evaluate simple, realistic attack strategies that require little to no knowledge of the underlying recommendation algorithm and find that, while coordinated users can significantly harm overall recommendation quality, they cannot selectively suppress specific content groups through reporting alone. Finally, we propose a mitigation strategy that shifts guarantees from the group level to the user level, showing empirically how it can reduce the impact of adversarial coordinated behaviour while ensuring personalized safety for individuals.

Key Contributions

Demonstrates that 1% coordinated adversarial users can degrade nDCG by 20% in risk-controlling recommender systems by exploiting 'Not Interested' feedback
Shows that simple, low-knowledge attack strategies can manipulate conformal risk control thresholds to harm recommendation quality while strengthening formal safety guarantees
Proposes user-level (vs group-level) guarantee mitigation that reduces coordinated attack impact while preserving personalized safety

🛡️ Threat Analysis

Model Skewing

Paper studies temporal, coordinated manipulation of user feedback signals over time to gradually degrade recommender system performance through exploitation of the feedback loop between user reports and risk-control calibration. The attack exploits the system's reliance on aggregate feedback to adjust filtering thresholds, causing drift in recommendation quality.