Latest papers

9 papers
defense arXiv Mar 24, 2026 · 13d ago

ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Xiangyu Yin, Yi Qi, Chih-hong Cheng · Chalmers University of Technology · Carl von Ossietzky University of Oldenburg +1 more

Reranking defense for RAG that detects corpus-poisoned passages using gradient-based instability signals under perturbations

Data Poisoning Attack Prompt Injection nlp
PDF
defense arXiv Mar 5, 2026 · 4w ago

Balancing Privacy-Quality-Efficiency in Federated Learning through Round-Based Interleaving of Protection Techniques

Yenan Wang, Carla Fabiana Chiasserini, Elad Michael Schiller · Chalmers University of Technology

Defends federated learning against gradient reconstruction attacks by interleaving DP, homomorphic encryption, and synthetic data rounds

Model Inversion Attack federated-learningvision
PDF
defense arXiv Mar 3, 2026 · 4w ago

Integrating Homomorphic Encryption and Synthetic Data in FL for Privacy and Learning Quality

Yenan Wang, Carla Fabiana Chiasserini, Elad Michael Schiller · Chalmers University of Technology

Defends federated learning against gradient inversion attacks using homomorphic encryption interleaved with synthetic data training rounds

Model Inversion Attack federated-learninggenerative
PDF
defense arXiv Feb 23, 2026 · 6w ago

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Zac Garby, Andrew D. Gordon, David Sands · University of Nottingham · University of Edinburgh +2 more

Formal lambda calculus with dynamic information-flow control proves noninterference guarantees for LLM agents against prompt injection

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Nov 10, 2025 · Nov 2025

HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection

Fangqi Dai, Xingjian Jiang, Zizhuang Deng · Shandong University · Xi’an Jiaotong-Liverpool University +2 more

Novel reward-based alignment method detects LLM-revised human text by tuning scoring models toward human writing preferences

Output Integrity Attack nlp
PDF Code
benchmark arXiv Oct 14, 2025 · Oct 2025

An Investigation of Memorization Risk in Healthcare Foundation Models

Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour et al. · MIT · Broad Institute +6 more

Black-box evaluation framework measuring extractable patient data memorization in healthcare EHR foundation models at embedding and generative levels

Model Inversion Attack tabular
1 citations PDF Code
defense arXiv Sep 19, 2025 · Sep 2025

Randomized Smoothing Meets Vision-Language Models

Emmanouil Seferis, Changshun Wu, Stefanos Kollias et al. · National Technical University of Athens · Université Grenoble Alpes +2 more

Extends Randomized Smoothing certification to VLMs via oracle classification, defending against adversarial image perturbations and jailbreak-style attacks

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
defense arXiv Sep 10, 2025 · Sep 2025

Perfectly-Private Analog Secure Aggregation in Federated Learning

Delio Jaramillo-Velez, Charul Rajput, Ragnar Freij-Hollanti et al. · Chalmers University of Technology · Aalto University

Torus-based secure aggregation for federated learning that provably prevents gradient leakage while avoiding finite-field accuracy losses

Model Inversion Attack federated-learning
PDF
attack arXiv Sep 4, 2025 · Sep 2025

Privacy Risks in Time Series Forecasting: User- and Record-Level Membership Inference

Nicolas Johansson, Tobias Olsson, Daniel Nilsson et al. · Chalmers University of Technology · AI Sweden

Introduces membership inference attacks for time series forecasting models, achieving perfect user-level detection on EEG and electricity datasets

Membership Inference Attack timeseries
PDF