ML Security Papers

Latest papers

2 papers

defense arXiv Jan 27, 2026 · 9w ago

GAVEL: Towards rule-based safety through activation monitoring

Shir Rozenfeld, Rahul Pankajakshan, Itay Zloczower et al. · Ben Gurion University of the Negev · Amrita Vishwa Vidyapeetham

Rule-based LLM safety framework using interpretable activation-level cognitive elements to detect harmful behaviors with high precision and auditability

Prompt Injection nlp

PDF

defense Array Nov 15, 2025 · Nov 2025

DeiTFake: Deepfake Detection Model using DeiT Multi-Stage Training

Saksham Kumar, Ashish Singh, Srinivasarao Thota et al. · Amrita Vishwa Vidyapeetham · Kalinga Institute of Industrial Technology +2 more

Proposes DeiT-based deepfake detector with two-stage progressive augmentation training, achieving 99.22% accuracy on OpenForensics

Output Integrity Attack vision

PDF

Latest papers

GAVEL: Towards rule-based safety through activation monitoring

DeiTFake: Deepfake Detection Model using DeiT Multi-Stage Training

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue