Latest papers

6 papers
survey arXiv Mar 25, 2026 · 12d ago

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan · University of Central Florida · University of Copenhagen

Unified taxonomy of ML security threats organizing attacks into data-to-data, data-to-model, model-to-data, and model-to-model categories

Input Manipulation Attack Data Poisoning Attack Model Inversion Attack Membership Inference Attack Model Theft Output Integrity Attack Model Poisoning Prompt Injection Sensitive Information Disclosure visionnlpmultimodal
PDF
attack arXiv Jan 30, 2026 · 9w ago

Semantic Leakage from Image Embeddings

Yiyi Chen, Qiongkai Xu, Desmond Elliott et al. · Aalborg University · Macquarie University +1 more

Recovers semantic content from compressed image embeddings via alignment and retrieval, exposing privacy risks in CLIP, GEMINI, COHERE, and NOMIC APIs

Model Inversion Attack visionmultimodal
PDF
defense arXiv Jan 29, 2026 · 9w ago

LoRA and Privacy: When Random Projections Help (and When They Don't)

Yaxi Hu, Johanna Düngler, Bernhard Schölkopf et al. · Max Planck Institute for Intelligent Systems · University of Copenhagen

Proves LoRA lacks inherent privacy via near-perfect MIA, then derives tighter DP bounds for noisy low-rank fine-tuning

Membership Inference Attack nlp
PDF
defense arXiv Dec 19, 2025 · Dec 2025

Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

Muhammad Haris Khan · University of Copenhagen

Defends fine-tuned LLMs against unauthorized use via secret-key-conditioned orthonormal hidden-state scrambling in LoRA adapters

Model Theft Model Theft nlp
PDF
benchmark arXiv Oct 14, 2025 · Oct 2025

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

João A. Leite, Arnav Arora, Silvia Gargova et al. · University of Sheffield · University of Copenhagen +2 more

Red-teams 8 LLMs with persona-targeted disinformation prompts across 4 languages, finding jailbreak rates rise up to 10 percentage points with simple personalisation

Prompt Injection nlp
1 citations 1 influentialPDF
defense arXiv Sep 30, 2025 · Sep 2025

Robust Federated Inference

Akash Dhasade, Sadegh Farhadkhani, Rachid Guerraoui et al. · EPFL · University of Copenhagen +1 more

Defends federated inference aggregators against Byzantine clients using DeepSet adversarial training, beating existing methods by up to 22%

Data Poisoning Attack federated-learningvisionnlp
1 citations PDF