ML Security Papers

Latest papers

5 papers

attack arXiv Mar 21, 2026 · 18d ago

Adversarial Attacks on Locally Private Graph Neural Networks

Matta Varun, Ajay Kumar Dhakar, Yuan Hong et al. · Indian Institute of Technology Kharagpur · University of Connecticut

Analyzes adversarial attacks on LDP-protected GNNs, exploring how privacy noise affects attack effectiveness and robustness

Input Manipulation Attack Data Poisoning Attack graph

PDF

attack arXiv Mar 10, 2026 · 29d ago

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Ali Raza, Gurang Gupta, Nikolay Matyunin et al. · Honda Research Institute Europe · Indian Institute of Technology Kharagpur

Activation-steering attack manipulates internal transformer states to jailbreak open-weight LLMs without fine-tuning or gradient-based prompt optimization

Prompt Injection nlp

PDF

benchmark arXiv Dec 25, 2025 · Dec 2025

Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art

Md Ashik Khan, Arafat Alam Jion · Indian Institute of Technology Kharagpur · Chittagong University of Engineering and Technology

Fixed-threshold evaluation protocol exposes genuine robustness gaps in AI-generated image detectors across CNN, ViT, and hybrid architectures

Output Integrity Attack vision

PDF

AI image generators create both photorealistic images and stylized art, necessitating robust detectors that maintain performance under common post-processing transformations (JPEG compression, blur, downscaling). Existing methods optimize single metrics without addressing deployment-critical factors such as operating point selection and fixed-threshold robustness. This work addresses misleading robustness estimates by introducing a fixed-threshold evaluation protocol that holds decision thresholds, selected once on clean validation data, fixed across all post-processing transformations. Traditional methods retune thresholds per condition, artificially inflating robustness estimates and masking deployment failures. We report deployment-relevant performance at three operating points (Low-FPR, ROC-optimal, Best-F1) under systematic degradation testing using a lightweight CNN-ViT hybrid with gated fusion and optional frequency enhancement. Our evaluation exposes a statistically validated forensic-semantic spectrum: frequency-aided CNNs excel on pristine photos but collapse under compression (93.33% to 61.49%), whereas ViTs degrade minimally (92.86% to 88.36%) through robust semantic pattern recognition. Multi-seed experiments demonstrate that all architectures achieve 15% higher AUROC on artistic content (0.901-0.907) versus photorealistic images (0.747-0.759), confirming that semantic patterns provide fundamentally more reliable detection cues than forensic artifacts. Our hybrid approach achieves balanced cross-domain performance: 91.4% accuracy on tiny-genimage photos, 89.7% on AiArtData art/graphics, and 98.3% (competitive) on CIFAKE. Fixed-threshold evaluation eliminates retuning inflation, reveals genuine robustness gaps, and yields actionable deployment guidance: prefer CNNs for clean photo verification, ViTs for compressed content, and hybrids for art/graphics screening.

cnn transformer Indian Institute of Technology Kharagpur · Chittagong University of Engineering and Technology

PDF arXiv DOI

benchmark arXiv Oct 17, 2025 · Oct 2025

The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers

Owais Makroo, Siva Rajesh Kasa, Sumegh Roychowdhury et al. · Indian Institute of Technology Kharagpur · Amazon.com Inc.

Benchmarks MIA vulnerability across generative and discriminative text classifiers, proving generative P(X,Y) models leak membership most severely

Membership Inference Attack nlp

PDF Code

defense arXiv Sep 25, 2025 · Sep 2025

Adaptive Federated Learning Defences via Trust-Aware Deep Q-Networks

Vedant Palit · Indian Institute of Technology Kharagpur

Defends federated learning against poisoning and backdoor attacks using a trust-aware Deep Q-Network under partial observability

Model Poisoning Data Poisoning Attack federated-learningreinforcement-learningvision

PDF

Latest papers

Adversarial Attacks on Locally Private Graph Neural Networks

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art

The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers

Adaptive Federated Learning Defences via Trust-Aware Deep Q-Networks

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue