ML Security Papers

Latest papers

6 papers

benchmark arXiv Feb 13, 2026 · 7w ago

A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

Yash Deo, Yan Jia, Toni Lassila et al. · University of York · University of Leeds +3 more

Proposes calibrated memorization metrics using MRI foundation model features to detect training data duplication in generative MRI models

Model Inversion Attack vision

PDF Code

attack arXiv Nov 26, 2025 · Nov 2025

TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

Jiaming He, Guanyu Hou, Hongwei Li et al. · University of Electronic Science and Technology of China · University of Manchester +3 more

Automated red-teaming framework crafts temporally-aware prompts to jailbreak T2V model safety filters, achieving 80%+ attack success rate

Prompt Injection visionnlpgenerativemultimodal

PDF

attack arXiv Sep 30, 2025 · Sep 2025

Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models

Daiheng Gao, Nanxiang Jiang, Andi Zhang et al. · University of Science and Technology of China · Beihang University +3 more

RL-based trajectory steering attack that resurrects concepts erased by safety mechanisms in diffusion models 10x faster than prior methods

Input Manipulation Attack visiongenerative

8 citations 1 influentialPDF

benchmark arXiv Sep 18, 2025 · Sep 2025

SynBench: A Benchmark for Differentially Private Text Generation

Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar et al. · Imperial College London · University of Manchester +2 more

Audits DP synthetic text generation via tailored MIA, showing pre-training contamination invalidates DP privacy guarantees across nine domain datasets.

Membership Inference Attack nlp

PDF

Data-driven decision support in high-stakes domains like healthcare and finance faces significant barriers to data sharing due to regulatory, institutional, and privacy concerns. While recent generative AI models, such as large language models, have shown impressive performance in open-domain tasks, their adoption in sensitive environments remains limited by unpredictable behaviors and insufficient privacy-preserving datasets for benchmarking. Existing anonymization methods are often inadequate, especially for unstructured text, as redaction and masking can still allow re-identification. Differential Privacy (DP) offers a principled alternative, enabling the generation of synthetic data with formal privacy assurances. In this work, we address these challenges through three key contributions. First, we introduce a comprehensive evaluation framework with standardized utility and fidelity metrics, encompassing nine curated datasets that capture domain-specific complexities such as technical jargon, long-context dependencies, and specialized document structures. Second, we conduct a large-scale empirical study benchmarking state-of-the-art DP text generation methods and LLMs of varying sizes and different fine-tuning strategies, revealing that high-quality domain-specific synthetic data generation under DP constraints remains an unsolved challenge, with performance degrading as domain complexity increases. Third, we develop a membership inference attack (MIA) methodology tailored for synthetic text, providing first empirical evidence that the use of public datasets - potentially present in pre-training corpora - can invalidate claimed privacy guarantees. Our findings underscore the urgent need for rigorous privacy auditing and highlight persistent gaps between open-domain and specialist evaluations, informing responsible deployment of generative AI in privacy-sensitive, high-stakes settings.

llm transformer Imperial College London · University of Manchester · Nanyang Technological University +1 more

PDF arXiv

tool arXiv Aug 18, 2025 · Aug 2025

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Chi Wang, Min Gao, Zongwei Wang et al. · Chongqing University · Emory University +1 more

Detects LLM-generated fake news by extracting prompt-induced linguistic fingerprints from reconstructed word-level probability distributions

Output Integrity Attack nlp

PDF Code

defense arXiv Aug 6, 2025 · Aug 2025

Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy

Jairo Gudiño-Rosero, Clément Contet, Umberto Grandi et al. · Université de Toulouse · Center for Collective Learning +4 more

Reveals prompt injection vulnerabilities in LLM consensus-generation systems and proposes a defense pipeline reducing attack success to near zero

Prompt Injection nlp

PDF

Latest papers

A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models

SynBench: A Benchmark for Differentially Private Text Generation

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue