ML Security Papers

Latest papers

7 papers

attack arXiv Mar 23, 2026 · 16d ago

Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates

Samrendra Roy, Kazuma Kobayashi, Souvik Chakraborty et al. · University of Illinois Urbana-Champaign · Indian Institute of Technology Delhi +1 more

Gradient-free adversarial attacks on neural operator digital twins causing catastrophic field prediction failures through sparse physically-plausible perturbations

Input Manipulation Attack vision

PDF

Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.

traditional_ml University of Illinois Urbana-Champaign · Indian Institute of Technology Delhi · National Center for Supercomputing Applications

PDF arXiv

attack arXiv Mar 15, 2026 · 24d ago

Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling

Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty · Indian Institute of Technology Delhi

Efficient red-teaming method that uncovers LLM jailbreaks through diverse response sampling rather than adversarial prompt optimization

Prompt Injection nlp

PDF

defense arXiv Jan 7, 2026 · Jan 2026

ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models

Sharanya Dasgupta, Arkaprabha Basu, Sujoy Nath et al. · Indian Statistical Institute · University of Surrey +1 more

Defends LLMs against jailbreaks and hallucinations by steering hidden states via GAN-trained intervention without fine-tuning

Prompt Injection nlp

PDF Code

attack arXiv Nov 16, 2025 · Nov 2025

Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning

Ankita Raj, Chetan Arora · Indian Institute of Technology Delhi

Injects backdoors into open-vocabulary object detectors via multi-modal prompt tuning without retraining base model weights

Model Poisoning Transfer Learning Attack visionmultimodal

PDF Code

defense arXiv Nov 8, 2025 · Nov 2025

Enhancing Robustness of Graph Neural Networks through p-Laplacian

Anuj Kumar Sirohi, Subhanu Halder, Kabir Kumar et al. · Indian Institute of Technology Delhi

Defends GNNs against poisoning and evasion attacks using a weighted p-Laplacian smoothing framework that scales better at high attack intensities

Input Manipulation Attack Data Poisoning Attack graph

PDF Code

attack arXiv Oct 17, 2025 · Oct 2025

Constrained Adversarial Perturbation

Virendra Nishad, Bhaskar Mukhoty, Hilal AlQuabeh et al. · Indian Institute of Technology Kanpur · Indian Institute of Technology Delhi +2 more

Proposes CAP, constraint-aware universal adversarial perturbations for tabular domains via augmented Lagrangian min-max optimization

Input Manipulation Attack tabular

PDF

attack arXiv Sep 19, 2025 · Sep 2025

SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection

Maithili Joshi, Palash Nandi, Tanmoy Chakraborty · Indian Institute of Technology Delhi

White-box jailbreak bypasses LLM safety alignment by adding cross-layer residual connections through middle-to-late layers, beating GCG by 51%

Prompt Injection nlp

PDF Code

Latest papers

Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates

Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling

ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models

Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning

Enhancing Robustness of Graph Neural Networks through p-Laplacian

Constrained Adversarial Perturbation

SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue