Latest papers

2 papers
benchmark arXiv Apr 21, 2026 · 4w ago

HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing

Euntae Kim, Soomin Han, Buru Chang · Korea University · Sogang University

Jailbreak attack exploiting collaborative writing by embedding harmful content in incomplete drafts, forcing LLMs to complete dangerous outputs

Prompt Injection nlp
PDF Code
defense arXiv Dec 4, 2025 · Dec 2025

Rethinking the Use of Vision Transformers for AI-Generated Image Detection

NaHyeon Park, Kunhee Kim, Junsuk Choe et al. · KAIST · Sogang University

Proposes MoLD, a gating-based multi-layer ViT feature fusion method that improves AI-generated image detection across GANs and diffusion models

Output Integrity Attack visiongenerative
1 citations 1 influentialPDF