attack 2025

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

Sidhant Narula 1, Javad Rafiei Asl 1, Mohammad Ghasemigol 1, Eduardo Blanco 2, Daniel Takabi 1

0 citations · 26 references · arXiv

α

Published on arXiv

2510.18728

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

HarmNet achieves 99.4% ASR on Mistral-7B and 94.8% on GPT-4o, outperforming the best baseline by 13.9% and 10.3% respectively across six LLMs.

HarmNet

Novel technique introduced


Large Language Models (LLMs) remain vulnerable to multi-turn jailbreak attacks. We introduce HarmNet, a modular framework comprising ThoughtNet, a hierarchical semantic network; a feedback-driven Simulator for iterative query refinement; and a Network Traverser for real-time adaptive attack execution. HarmNet systematically explores and refines the adversarial space to uncover stealthy, high-success attack paths. Experiments across closed-source and open-source LLMs show that HarmNet outperforms state-of-the-art methods, achieving higher attack success rates. For example, on Mistral-7B, HarmNet achieves a 99.4% attack success rate, 13.9% higher than the best baseline. Index terms: jailbreak attacks; large language models; adversarial framework; query refinement.


Key Contributions

  • ThoughtNet: a hierarchical semantic network that systematically maps and explores adversarial topic/sentence/entity chains for a given harmful intent
  • Feedback-driven Simulator that iteratively refines candidate attack chains using harmfulness and semantic alignment scoring
  • Network Traverser that selects and executes the optimal multi-turn attack chain in real time, achieving SOTA ASRs across closed- and open-source LLMs

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
HarmBench
Applications
llm safety alignmentchatbot security