ML Security Papers

Latest papers

5 papers

defense arXiv Jan 2, 2026 · Jan 2026

Fusion-SSAT: Unleashing the Potential of Self-supervised Auxiliary Task by Feature Fusion for Generalized Deepfake Detection

Shukesh Reddy, Srijan Das, Abhijit Das · Birla Institute of Technology and Science · University of North Carolina at Charlotte

Novel deepfake detector fusing self-supervised auxiliary task features with primary encoder to improve cross-dataset generalization across face forgery benchmarks

Output Integrity Attack vision

PDF

defense arXiv Dec 22, 2025 · Dec 2025

PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

Akshaj Prashanth Rao, Advait Singh, Saumya Kumaar Saksena et al. · Birla Institute of Technology and Science · Trustwise

Lightweight TF-IDF + Linear SVM multi-stage pipeline defends LLMs against prompt injection and jailbreaks with 10x lower latency than ShieldGemma

Prompt Injection nlp

1 citations PDF

defense arXiv Nov 27, 2025 · Nov 2025

Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition

Maheswar Bora, Tashvik Dhamija, Shukesh Reddy et al. · Birla Institute of Technology and Science · INRIA

Proposes FauxNet, a VSR-based deepfake video detector achieving generalizable zero-shot detection across unseen generation techniques

Output Integrity Attack visionmultimodal

PDF

defense arXiv Nov 13, 2025 · Nov 2025

Exposing DeepFakes via Hyperspectral Domain Mapping

Aditya Mehta, Swarnim Chaudhary, Pratik Narang et al. · Birla Institute of Technology and Science

Detects deepfakes by expanding RGB to 31-channel hyperspectral representation, amplifying artifacts invisible to RGB-based detectors

Output Integrity Attack visiongenerative

PDF

tool arXiv Sep 23, 2025 · Sep 2025

Diversity Boosts AI-Generated Text Detection

Advik Raj Basani, Pin-Yu Chen · Birla Institute of Technology and Science · IBM Research

Detects AI-generated text via surprisal diversity features, outperforming zero-shot baselines by up to 33% with adversarial robustness

Output Integrity Attack nlp

4 citations 1 influentialPDF Code

Latest papers

Fusion-SSAT: Unleashing the Potential of Self-supervised Auxiliary Task by Feature Fusion for Generalized Deepfake Detection

PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition

Exposing DeepFakes via Hyperspectral Domain Mapping

Diversity Boosts AI-Generated Text Detection

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue