ML Security Papers

Latest papers

5 papers

benchmark arXiv Mar 18, 2026 · 19d ago

Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions

Madhav S. Baidya, S. S. Baidya, Chirag Chawla · Indian Institute of Technology (BHU) · Indian Institute of Technology Guwahati

Comprehensive benchmark of AI text detectors showing transformers excel in-domain but fail cross-domain, with no method robust to both distribution shift and adversarial humanization

Output Integrity Attack nlp

PDF Code

attack arXiv Dec 9, 2025 · Dec 2025

Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation

Sampriti Soor, Suklav Ghosh, Arijit Sur · Indian Institute of Technology Guwahati

Gradient-optimized universal adversarial token suffixes degrade LLM classifiers across tasks and model families via Gumbel-Softmax relaxation

Input Manipulation Attack Prompt Injection nlp

PDF

attack arXiv Dec 9, 2025 · Dec 2025

Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward

Sampriti Soor, Suklav Ghosh, Arijit Sur · arXiv · Indian Institute of Technology Guwahati

RL-trained adversarial suffixes degrade LLM classification accuracy using PPO and calibrated cross-entropy, outperforming gradient-based triggers in transferability

Input Manipulation Attack nlp

PDF

attack arXiv Nov 3, 2025 · Nov 2025

A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model

Sampriti Soor, Alik Pramanick, Jothiprakash K et al. · Indian Institute of Technology Guwahati · Kalinga Institute of Industrial Technology

GAN + CLIP-guided black-box adversarial attack on multilabel classifiers using saliency and text-embedding loss

Input Manipulation Attack visionmultimodal

PDF

defense arXiv Oct 31, 2025 · Oct 2025

Trans-defense: Transformer-based Denoiser for Adversarial Defense with Spatial-Frequency Domain Representation

Alik Pramanick, Mayank Bansal, Utkarsh Srivastava et al. · Indian Institute of Technology Guwahati

Defends image classifiers against adversarial attacks via a transformer denoiser fusing spatial and DWT frequency features

Input Manipulation Attack vision

1 citations PDF Code

Latest papers

Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions

Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation

Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward

A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model

Trans-defense: Transformer-based Denoiser for Adversarial Defense with Spatial-Frequency Domain Representation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue