Mark Vero

attack arXiv Oct 9, 2025 · Oct 2025

Kazuki Egashira, Robin Staab, Thibaud Gloaguen et al. · ETH Zürich

Crafts trojaned LLM weights appearing benign that activate jailbreak or safety bypass after standard pruning with vLLM

Model Poisoning nlp

3 citations PDF

attack arXiv Oct 21, 2025 · Oct 2025

Giovanni De Muri, Mark Vero, Robin Staab et al. · ETH Zürich

Introduces T-MTB backdoor attack that survives LLM knowledge distillation by using frequent, composite trigger tokens

Model Poisoning Transfer Learning Attack nlp

Papers in Database (2)