Latest papers

2 papers
attack arXiv Oct 24, 2025 · Oct 2025

$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

Kieu Dang, Phung Lai, NhatHai Phan et al. · University at Albany · New Jersey Institute of Technology +2 more

LDP noise injection during fine-tuning steals LLM behavior from APIs while evading watermark detectors, achieving 96.95% attack success rate

Model Theft Output Integrity Attack Model Theft nlp
2 citations PDF Code
defense arXiv Aug 19, 2025 · Aug 2025

CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection

Jiaming Hu, Haoyu Wang, Debarghya Mukherjee et al. · University at Albany · Boston University

Dual-track prompt-level defense isolates query semantic cores to neutralize LLM jailbreaks including GCG and DeepInception

Input Manipulation Attack Prompt Injection nlp
PDF