Latest papers

11 papers
defense arXiv Mar 19, 2026 · 18d ago

FedTrident: Resilient Road Condition Classification Against Poisoning Attacks in Federated Learning

Sheng Liu, Panos Papadimitratos · KTH Royal Institute of Technology

Three-stage defense detecting poisoned FL models, excluding malicious vehicular clients, and remediating corrupted global models against label-flipping attacks

Data Poisoning Attack visionfederated-learning
PDF
benchmark arXiv Mar 14, 2026 · 23d ago

What Counts as Real? Speech Restoration and Voice Quality Conversion Pose New Challenges to Deepfake Detection

Shree Harsha Bokkahalli Satish, Harm Lameris, Joakim Gustafson et al. · KTH Royal Institute of Technology

Audio deepfake detectors misclassify benign voice transformations as spoofed; proposes multi-class framework to distinguish authentic processing from malicious synthesis

Output Integrity Attack audio
PDF
benchmark arXiv Mar 12, 2026 · 25d ago

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Ching-Yu Kao, Xinfeng Li, Shenyu Dai et al. · Fraunhofer AISEC · Nanyang Technological University +3 more

Benchmarks documentation-embedded indirect prompt injection against high-privilege LLM agents, achieving 85% exfiltration success with 0% human detection rate

Prompt Injection Excessive Agency nlp
PDF
attack arXiv Dec 24, 2025 · Dec 2025

Beyond Context: Large Language Models Failure to Grasp Users Intent

Ahmed M. Hussain, Salahuddin Salahuddin, Panos Papadimitratos · KTH Royal Institute of Technology

Demonstrates three natural-language jailbreak techniques exploiting LLMs' intent-blindness, finding reasoning modes amplify vulnerability

Prompt Injection nlp
1 citations PDF
defense arXiv Dec 5, 2025 · Dec 2025

DEFEND: Poisoned Model Detection and Malicious Client Exclusion Mechanism for Secure Federated Learning-based Road Condition Classification

Sheng Liu, Panos Papadimitratos · KTH Royal Institute of Technology

Defends federated learning road-condition classifiers from label-flipping poisoning via neuron-magnitude analysis and GMM-based malicious client detection and exclusion

Data Poisoning Attack visionfederated-learning
PDF
survey Journal of Medical Internet Re... Nov 14, 2025 · Nov 2025

Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis

Farhad Abtahi, Fernando Seoane, Iván Pau et al. · Karolinska Institutet · KTH Royal Institute of Technology +3 more

Surveys data poisoning vulnerabilities across healthcare AI — CNNs, LLMs, RL, and federated learning — with 60%+ attack success using 100–500 samples

Data Poisoning Attack AI Supply Chain Attacks Training Data Poisoning visionnlpreinforcement-learningfederated-learning
1 citations PDF
defense arXiv Nov 5, 2025 · Nov 2025

Byzantine-Robust Federated Learning with Learnable Aggregation Weights

Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira et al. · Uppsala University · KTH Royal Institute of Technology

Defends federated learning against Byzantine clients using learnable aggregation weights jointly optimized with global model parameters

Data Poisoning Attack federated-learning
PDF
defense NeurIPS Oct 26, 2025 · Oct 2025

If You Want to Be Robust, Be Wary of Initialization

Sofiane Ennadir, Johannes F. Lutzeyer, Michalis Vazirgiannis et al. · KTH Royal Institute of Technology · École Polytechnique +1 more

Defends GNNs against adversarial graph perturbations by theoretically linking weight initialization to robustness, achieving up to 50% improvement.

Input Manipulation Attack graph
4 citations PDF
attack arXiv Sep 16, 2025 · Sep 2025

Jailbreaking Large Language Models Through Content Concretization

Johan Wahréus, Ahmed Hussain, Panos Papadimitratos · KTH Royal Institute of Technology

Iterative two-stage jailbreak escalates abstract malicious prompts to executable code, hitting 62% success rate at 7.5¢ per prompt

Prompt Injection nlp
PDF
defense European Conference on Artific... Aug 21, 2025 · Aug 2025

Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space

Kiarash Kazari, Ezzeldin Shereen, György Dán · KTH Royal Institute of Technology

Detects adversarial attacks on cooperative MARL agents using Gaussian behavior modeling and CUSUM anomaly detection

Input Manipulation Attack reinforcement-learning
PDF
benchmark arXiv Jan 2, 2025 · Jan 2025

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

Johan Wahréus, Ahmed Mohamed Hussain, Panos Papadimitratos · KTH Royal Institute of Technology

Introduces cybersecurity-domain jailbreak benchmark with 12,662 prompts; prompt obfuscation attack achieves 88% success on Gemini

Prompt Injection nlp
PDF