Latest papers

4 papers
benchmark arXiv Jan 12, 2026 · 12w ago

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

Yuxi Xia, Kinga Stańczak, Benjamin Roth · University of Vienna · Saarland University

Benchmark and linguistic analysis explaining why AI-text detectors fail to generalize across prompts, models, and domains

Output Integrity Attack nlp
PDF Code
benchmark arXiv Dec 31, 2025 · Dec 2025

Safe in the Future, Dangerous in the Past: Dissecting Temporal and Linguistic Vulnerabilities in LLMs

Muhammad Abdullahi Said, Muhammad Sammani Sani · African Institute for Mathematical Sciences · University of Vienna

Audits LLM safety across Hausa/English and temporal frames, revealing past-tense framing bypasses defenses with only 15.6% safe responses

Prompt Injection nlp
PDF Code
attack arXiv Nov 19, 2025 · Nov 2025

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

Zhaoxin Zhang, Borui Chen, Yiming Hu et al. · City University of Macau · University of Vienna +3 more

Novel LLM jailbreak using conceptual morphology triggers to shift ideological orientation in outputs without triggering safety filters

Prompt Injection nlp
PDF
benchmark arXiv Aug 29, 2025 · Aug 2025

I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks

Daryna Oliynyk, Rudolf Mayer, Kathrin Grosse et al. · University of Vienna · SBA Research +2 more

Proposes first comprehensive threat model and evaluation framework for comparing model stealing attacks on image classifiers

Model Theft vision
PDF