ML Security Papers

Latest papers

2 papers

defense arXiv Mar 3, 2026 · 4w ago

ExpGuard: LLM Content Moderation in Specialized Domains

Minseok Choi, Dongjin Kim, Seungbin Yang et al. · KAIST · Kakaobank

Proposes domain-specialized LLM guardrail for financial, medical, and legal contexts, outperforming WildGuard on adversarial prompt and response classification

Prompt Injection nlp

PDF

benchmark arXiv Feb 20, 2026 · 6w ago

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Mirae Kim, Seonghun Jeong, Youngjun Kwak · Kakaobank

Bilingual multimodal dataset for training and evaluating jailbreak detectors in financial VLM applications across 15+ finance topics

Prompt Injection nlpvisionmultimodal

PDF

Latest papers

ExpGuard: LLM Content Moderation in Specialized Domains

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue