Zeerak Talat

defense IJCNLP-AACL Oct 19, 2025 · Oct 2025

Masahiro Kaneko, Zeerak Talat, Timothy Baldwin · MBZUAI · University of Edinburgh

Online learning defense dynamically counters iterative LLM jailbreaks via RL prompt optimization and gradient damping

Prompt Injection nlp

3 citations PDF

attack arXiv Oct 15, 2025 · Oct 2025

Hamdan Al-Ali, Ali Reza Ghavamipour, Tommaso Caselli et al. · Mohamed bin Zayed University of Artificial Intelligence · Maastricht University +2 more

Infers private personal attributes from federated ASR model weight differentials using shadow models and centroid classification

Model Inversion Attack audiofederated-learning

Papers in Database (2)