ML Security Papers

Stats

Latest papers

10 papers

attack arXiv Apr 22, 2026 · 29d ago

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski et al. · University of Florida · SRI

Interpretability-driven jailbreak audit using activation steering on 8 LLMs, achieving 91% success on Llama-3.3-70B

Prompt Injection nlp

PDF

attack arXiv Apr 21, 2026 · 4w ago

A Data-Free Membership Inference Attack on Federated Learning in Hardware Assurance

Gijung Lee, Wavid Bowman, Olivia P. Dizon-Paradis et al. · University of Florida

Data-free gradient inversion attack on federated learning that reconstructs hardware circuit images to infer sensitive IP characteristics

Model Inversion Attack Membership Inference Attack visionfederated-learning

PDF

attack arXiv Apr 21, 2026 · 4w ago

DECIFR: Domain-Aware Exfiltration of Circuit Information from Federated Gradient Reconstruction

Gijung Lee, Wavid Bowman, Olivia P. Dizon-Paradis et al. · University of Florida

Membership inference attack on federated learning IC segmentation models using gradient inversion guided by standard cell library layouts

Membership Inference Attack Model Inversion Attack visionfederated-learning

PDF

attack arXiv Apr 11, 2026 · 5w ago

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Vishal Pramanik, Maisha Maliha, Susmit Jha et al. · University of Oklahoma · University of Florida +1 more

Circuit-level jailbreak attack using causal head masking and nullspace steering to bypass LLM safety mechanisms with SOTA success rates

Prompt Injection nlp

PDF Code

defense arXiv Jan 22, 2026 · Jan 2026

NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs

Khoa Nguyen, Khiem Ton, NhatHai Phan et al. · New Jersey Institute of Technology · Hamad Bin Khalifa University +2 more

Defends LLM code generation prompts from cloud reconstruction via embedding-level local differential privacy and a randomized tokenizer

Model Inversion Attack Sensitive Information Disclosure nlp

1 citations 1 influentialPDF

attack arXiv Jan 10, 2026 · Jan 2026

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

Quan Minh Nguyen, Min-Seon Kim, Hoang M. Ngo et al. · University of Florida · North Carolina State University +2 more

PromptMIA: malicious server exploits adversarial soft prompt updates in federated prompt-tuning to infer client training membership

Membership Inference Attack Transfer Learning Attack nlpfederated-learning

PDF

survey arXiv Dec 29, 2025 · Dec 2025

Application-Specific Power Side-Channel Attacks and Countermeasures: A Survey

Sahan Sanjaya, Aruna Jayasena, Prabhat Mishra · University of Florida · University of Tennessee

Surveys power side-channel attacks across cryptography, ML model reverse engineering, user behavior exploitation, and code disassembly

Model Theft vision

PDF

attack arXiv Oct 22, 2025 · Oct 2025

HAMLOCK: HArdware-Model LOgically Combined attacK

Sanskar Amgain, Daniel Lobo, Atri Chatterjee et al. · University of Tennessee · University of Florida

Backdoor attack splits trigger logic across hardware Trojan and minimal model edits, defeating all software-level DNN defenses

Model Poisoning AI Supply Chain Attacks vision

PDF

defense arXiv Sep 25, 2025 · Sep 2025

WISER: Segmenting watermarked region - an epidemic change-point perspective

Soham Bonnerjee, Sayar Karmakar, Subhrajyoty Roy · University of Chicago · University of Florida +1 more

Localizes multiple LLM-watermarked text segments in mixed-source documents using an epidemic change-point algorithm with finite-sample guarantees

Output Integrity Attack nlp

PDF

attack arXiv Aug 18, 2025 · Aug 2025

DASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples

Abdullah Al Nomaan Nafi, Habibur Rahaman, Zafaryab Haider et al. · University of Maine · University of Florida +1 more

Meta-attack framework adaptively combining Lp-based attacks to generate perceptually aligned adversarial examples, outperforming AdvAD by 20% ASR

Input Manipulation Attack vision

PDF

Latest papers

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

A Data-Free Membership Inference Attack on Federated Learning in Hardware Assurance

DECIFR: Domain-Aware Exfiltration of Circuit Information from Federated Gradient Reconstruction

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

Application-Specific Power Side-Channel Attacks and Countermeasures: A Survey

HAMLOCK: HArdware-Model LOgically Combined attacK

WISER: Segmenting watermarked region - an epidemic change-point perspective

DASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue