Sai Praneeth Karimireddy

h-index: 2 11 citations 11 papers (total)

Papers in Database (2)

attack arXiv Jan 29, 2026 · 9w ago

Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment

Yavuz Bakman, Duygu Nur Yaldiz, Salman Avestimehr et al. · University of Southern California

Proves static black-box alignment guarantees nothing post-update; constructs LLMs hiding latent jailbreak misalignment triggered by one benign gradient step

Model Poisoning Prompt Injection nlp
1 citations PDF
benchmark arXiv Sep 22, 2025 · Sep 2025

VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks

Efthymios Tsaprazlis, Thanathai Lertpetchpun, Tiantian Feng et al. · University of Southern California

VoxGuard benchmarks voice anonymization privacy via low-FPR membership inference, showing EER massively underestimates adversarial leakage

Membership Inference Attack audio
PDF