Adel Bibi

h-index: 5 170 citations 23 papers (total)

Papers in Database (5)

attack arXiv Sep 25, 2025 · Sep 2025

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

Runqi Lin, Alasdair Paren, Suqin Yuan et al. · The University of Sydney · University of Oxford

Improves transferability of adversarial visual jailbreaks against closed-source MLLMs via loss landscape flattening and feature over-reliance correction

Input Manipulation Attack Prompt Injection visionmultimodalnlp
6 citations PDF
attack arXiv Oct 2, 2025 · Oct 2025

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Jonathan Sneh, Ruomei Yan, Jialin Yu et al. · University of Oxford · Microsoft

Adversarially crafts tool names and descriptions to bias LLM agents into selecting attacker-controlled tools over fair alternatives

Insecure Plugin Design Prompt Injection nlp
6 citations 1 influentialPDF
attack arXiv Jan 30, 2026 · 9w ago

The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models

Yupeng Chen, Junchi Yu, Aoxi Liu et al. · University of Oxford · The Chinese University of Hong Kong

Transfers text jailbreaks to audio via modality alignment in omni-models, outperforming native audio jailbreaks as a new red-teaming baseline

Prompt Injection audionlpmultimodal
PDF
attack arXiv Feb 13, 2026 · 7w ago

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Akshat Naik, Jay J Culligan, Yarin Gal et al. · University of Oxford · Toyota Motor Europe

Indirect prompt injection attack exfiltrates sensitive data across multi-agent LLM orchestrators, bypassing data access controls with a single injected payload

Prompt Injection Sensitive Information Disclosure nlp
PDF
attack arXiv Jan 30, 2026 · 9w ago

A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode

Zeyuan He, Yupeng Chen, Lang Lin et al. · University of Oxford · The Chinese University of Hong Kong +2 more

Discovers D-LLMs' intrinsic jailbreak resistance, then breaks it with context nesting prompts achieving SOTA attack rates

Prompt Injection nlp
PDF