Latest papers

7 papers
attack arXiv Feb 10, 2026 · 7w ago

Linear Model Extraction via Factual and Counterfactual Queries

Daan Otto, Jannis Kurtz, Dick den Hertog et al. · University of Amsterdam

Derives exact query complexity bounds for extracting linear model parameters via counterfactual explanations, showing single-query full extraction is possible

Model Theft tabular
PDF
defense arXiv Jan 30, 2026 · 9w ago

No More, No Less: Least-Privilege Language Models

Paulius Rauba, Dominykas Seputis, Patrikas Vanagas et al. · University of Cambridge · Vinted +2 more

Proposes inference-time capability restriction for LLMs by controlling reachable internal computation via rank-indexed weight interventions

Prompt Injection nlp
PDF
survey arXiv Jan 23, 2026 · 10w ago

Emerging Threats and Countermeasures in Neuromorphic Systems: A Survey

Pablo Sorrentino, Stjepan Picek, Ihsen Alouani et al. · University of Groningen · University of Zagreb +5 more

Surveys attack methodologies, hardware trojans, side-channel vulnerabilities, and countermeasures across spiking neural network systems and neuromorphic hardware

Input Manipulation Attack Model Poisoning
PDF
attack arXiv Jan 19, 2026 · 11w ago

Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

Asen Dotsinski, Panagiotis Eustratiadis · University of Amsterdam

Jailbreaks open-weight LLMs by injecting acceptance prefixes into outputs, outperforming GCG by up to 80% ASR with zero optimization

Input Manipulation Attack Prompt Injection nlp
PDF Code
defense arXiv Dec 15, 2025 · Dec 2025

Grab-3D: Detecting AI-Generated Videos from 3D Geometric Temporal Consistency

Wenhan Chen, Sezer Karaoglu, Theo Gevers · University of Amsterdam

Detects AI-generated videos by exploiting 3D geometric inconsistencies via vanishing-point-aware transformer with temporal-geometric attention

Output Integrity Attack visiongenerative
PDF
defense arXiv Nov 25, 2025 · Nov 2025

Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs

Ziqi Wang, Chang Che, Qi Wang et al. · Hefei University of Technology · Tsinghua University +1 more

Defends safety alignment of multimodal LLMs against degradation during continual visual fine-tuning via orthogonal parameter adaptation

Transfer Learning Attack Prompt Injection visionnlpmultimodal
1 citations PDF
attack arXiv Jan 8, 2025 · Jan 2025

Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval

Yongkang Li, Panagiotis Eustratiadis, Evangelos Kanoulas · University of Amsterdam

Accelerates HotFlip corpus poisoning 16x via query clustering and evaluates black-box and query-agnostic attack variants on dense retrievers

Data Poisoning Attack Input Manipulation Attack nlp
PDF