Latest papers

2 papers
defense arXiv Mar 3, 2026 · 4w ago

ExpGuard: LLM Content Moderation in Specialized Domains

Minseok Choi, Dongjin Kim, Seungbin Yang et al. · KAIST · Kakaobank

Proposes domain-specialized LLM guardrail for financial, medical, and legal contexts, outperforming WildGuard on adversarial prompt and response classification

Prompt Injection nlp
PDF
benchmark arXiv Feb 20, 2026 · 6w ago

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Mirae Kim, Seonghun Jeong, Youngjun Kwak · Kakaobank

Bilingual multimodal dataset for training and evaluating jailbreak detectors in financial VLM applications across 15+ finance topics

Prompt Injection nlpvisionmultimodal
PDF