Latest papers

6 papers
benchmark arXiv Feb 13, 2026 · 7w ago

A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

Yash Deo, Yan Jia, Toni Lassila et al. · University of York · University of Leeds +3 more

Proposes calibrated memorization metrics using MRI foundation model features to detect training data duplication in generative MRI models

Model Inversion Attack vision
PDF Code
attack arXiv Nov 26, 2025 · Nov 2025

TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

Jiaming He, Guanyu Hou, Hongwei Li et al. · University of Electronic Science and Technology of China · University of Manchester +3 more

Automated red-teaming framework crafts temporally-aware prompts to jailbreak T2V model safety filters, achieving 80%+ attack success rate

Prompt Injection visionnlpgenerativemultimodal
PDF
attack arXiv Sep 30, 2025 · Sep 2025

Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models

Daiheng Gao, Nanxiang Jiang, Andi Zhang et al. · University of Science and Technology of China · Beihang University +3 more

RL-based trajectory steering attack that resurrects concepts erased by safety mechanisms in diffusion models 10x faster than prior methods

Input Manipulation Attack visiongenerative
8 citations 1 influentialPDF
benchmark arXiv Sep 18, 2025 · Sep 2025

SynBench: A Benchmark for Differentially Private Text Generation

Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar et al. · Imperial College London · University of Manchester +2 more

Audits DP synthetic text generation via tailored MIA, showing pre-training contamination invalidates DP privacy guarantees across nine domain datasets.

Membership Inference Attack nlp
PDF
tool arXiv Aug 18, 2025 · Aug 2025

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Chi Wang, Min Gao, Zongwei Wang et al. · Chongqing University · Emory University +1 more

Detects LLM-generated fake news by extracting prompt-induced linguistic fingerprints from reconstructed word-level probability distributions

Output Integrity Attack nlp
PDF Code
defense arXiv Aug 6, 2025 · Aug 2025

Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy

Jairo Gudiño-Rosero, Clément Contet, Umberto Grandi et al. · Université de Toulouse · Center for Collective Learning +4 more

Reveals prompt injection vulnerabilities in LLM consensus-generation systems and proposes a defense pipeline reducing attack success to near zero

Prompt Injection nlp
PDF