Latest papers

4 papers
defense arXiv Jan 12, 2026 · 12w ago

Reward-Preserving Attacks For Robust Reinforcement Learning

Lucas Schott, Elies Gherbi, Hatem Hajri et al. · IRT SystemX · Sorbonne Université +2 more

Adaptive adversarial training for RL using reward-preserving attacks that calibrate perturbation strength to avoid making tasks unsolvable

Input Manipulation Attack reinforcement-learning
PDF
benchmark EMNLP Oct 15, 2025 · Oct 2025

How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study

Matthieu Dubois, François Yvon, Pablo Piantanida · Sorbonne Université · CNRS +2 more

Benchmarks AI text detectors across 37 decoding configs, showing AUROC collapses from 0.99 to 0.01 with minor sampling changes

Output Integrity Attack nlp
2 citations PDF Code
defense arXiv Sep 30, 2025 · Sep 2025

Robust Federated Inference

Akash Dhasade, Sadegh Farhadkhani, Rachid Guerraoui et al. · EPFL · University of Copenhagen +1 more

Defends federated inference aggregators against Byzantine clients using DeepSet adversarial training, beating existing methods by up to 22%

Data Poisoning Attack federated-learningvisionnlp
1 citations PDF
attack arXiv Sep 30, 2025 · Sep 2025

Stealing AI Model Weights Through Covert Communication Channels

Valentin Barbaza, Alan Rodrigo Diaz-Rizo, Hassan Aboushady et al. · Sorbonne Université

Hardware Trojan in AI accelerators covertly exfiltrates model weights via wireless channel, enabling complete architecture-agnostic model theft

Model Theft
PDF