ML Security Papers

Latest papers

4 papers

benchmark arXiv Jan 12, 2026 · 12w ago

Yuxi Xia, Kinga Stańczak, Benjamin Roth · University of Vienna · Saarland University

Benchmark and linguistic analysis explaining why AI-text detectors fail to generalize across prompts, models, and domains

Output Integrity Attack nlp

benchmark arXiv Dec 31, 2025 · Dec 2025

Muhammad Abdullahi Said, Muhammad Sammani Sani · African Institute for Mathematical Sciences · University of Vienna

Audits LLM safety across Hausa/English and temporal frames, revealing past-tense framing bypasses defenses with only 15.6% safe responses

Prompt Injection nlp

attack arXiv Nov 19, 2025 · Nov 2025

Zhaoxin Zhang, Borui Chen, Yiming Hu et al. · City University of Macau · University of Vienna +3 more

Novel LLM jailbreak using conceptual morphology triggers to shift ideological orientation in outputs without triggering safety filters

Prompt Injection nlp

benchmark arXiv Aug 29, 2025 · Aug 2025

Daryna Oliynyk, Rudolf Mayer, Kathrin Grosse et al. · University of Vienna · SBA Research +2 more

Proposes first comprehensive threat model and evaluation framework for comparing model stealing attacks on image classifiers

Model Theft vision