Michael Backes

Papers in Database (3)

benchmark arXiv Mar 12, 2026 · 25d ago

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

Junjie Chu, Yiting Qu, Ye Leng et al. · CISPA Helmholtz Center for Information Security · Delft University of Technology

Benchmarks LLM safety alignment failures when harmful content is embedded in benign tasks like translation, revealing a content-level ethical blind spot

Prompt Injection nlp
PDF
benchmark arXiv Aug 28, 2025 · Aug 2025

JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring

Junjie Chu, Mingjie Li, Ziqing Yang et al. · CISPA Helmholtz Center for Information Security · Xi’an Jiaotong University

Benchmark framework using decompositional scoring to evaluate LLM jailbreak success, achieving 98.5% human agreement and exposing attack overestimation

Prompt Injection nlp
PDF Code
defense arXiv Jan 11, 2025 · Jan 2025

DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Wenshu Fan, Minxing Zhang, Hongwei Li et al. · University of Electronic Science and Technology of China · CISPA Helmholtz Center for Information Security +1 more

Introduces adaptive gallery-update attack breaking all AFR defenses, then counters with diverse adversarial perturbations for facial privacy

Input Manipulation Attack vision
PDF Code