Martin Vechev

h-index: 64 16,027 citations 295 papers (total)

Papers in Database (6)

defense arXiv Sep 29, 2025 · Sep 2025

Watermarking Diffusion Language Models

Thibaud Gloaguen, Robin Staab, Nikola Jovanović et al. · ETH Zürich

First watermarking scheme for diffusion LLMs, achieving >99% true positive rate with minimal text quality degradation

Output Integrity Attack nlpgenerative
4 citations 1 influentialPDF
attack arXiv Oct 9, 2025 · Oct 2025

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Kazuki Egashira, Robin Staab, Thibaud Gloaguen et al. · ETH Zürich

Crafts trojaned LLM weights appearing benign that activate jailbreak or safety bypass after standard pruning with vLLM

Model Poisoning nlp
3 citations PDF
defense arXiv Dec 1, 2025 · Dec 2025

Dual Randomized Smoothing: Beyond Global Noise Variance

Chenhao Sun, Yuhao Mao, Martin Vechev · ETH Zürich

Proposes input-dependent noise variance in Randomized Smoothing to simultaneously certify robustness at both small and large perturbation radii

Input Manipulation Attack vision
1 citations PDF
attack arXiv Oct 28, 2025 · Oct 2025

SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning

Alexander Bakarsky, Dimitar I. Dimitrov, Maximilian Baader et al. · ETH Zürich · INSAIT +1 more

Scales gradient inversion attacks in federated learning to 10x larger batch sizes using sparse dictionary learning

Model Inversion Attack federated-learning
PDF
attack arXiv Oct 21, 2025 · Oct 2025

Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation

Giovanni De Muri, Mark Vero, Robin Staab et al. · ETH Zürich

Introduces T-MTB backdoor attack that survives LLM knowledge distillation by using frequent, composite trigger tokens

Model Poisoning Transfer Learning Attack nlp
PDF
defense arXiv Feb 6, 2026 · 8w ago

A Unified Framework for LLM Watermarks

Thibaud Gloaguen, Robin Staab, Nikola Jovanović et al. · ETH Zürich

Unifies LLM watermarking schemes under constrained optimization, revealing quality-diversity-power trade-offs and enabling principled design of optimal schemes

Output Integrity Attack nlp
PDF