defense 2026

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang 1, Hu Lou 1, Jun-Jie Zhang 1, Jun Zhu 2, Deyu Meng 3

0 citations

α

Published on arXiv

2603.19562

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Demonstrates that adversarial fragility (high input-gradient coupling) and hallucination (low coupling) are opposite regimes of the same uncertainty budget, with reliable behavior in an intermediate 'Goldilocks' band

Neural Uncertainty Principle (NUP)

Novel technique introduced


Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveals that they share a common geometric origin: the input and its loss gradient are conjugate observables subject to an irreducible uncertainty bound. Formalizing a Neural Uncertainty Principle (NUP) under a loss-induced state, we find that in near-bound regimes, further compression must be accompanied by increased sensitivity dispersion (adversarial fragility), while weak prompt-gradient coupling leaves generation under-constrained (hallucination). Crucially, this bound is modulated by an input-gradient correlation channel, captured by a specifically designed single-backward probe. In vision, masking highly coupled components improves robustness without costly adversarial training; in language, the same prefill-stage probe detects hallucination risk before generating any answer tokens. NUP thus turns two seemingly separate failure taxonomies into a shared uncertainty-budget view and provides a principled lens for reliability analysis. Guided by this NUP theory, we propose ConjMask (masking high-contribution input components) and LogitReg (logit-side regularization) to improve robustness without adversarial training, and use the probe as a decoding-free risk signal for LLMs, enabling hallucination detection and prompt selection. NUP thus provides a unified, practical framework for diagnosing and mitigating boundary anomalies across perception and generation tasks.


Key Contributions

  • Formalizes Neural Uncertainty Principle (NUP) showing adversarial vulnerability and LLM hallucination share a geometric origin as conjugate observables under an uncertainty bound
  • Introduces Conjugate Correlation Probe (CC-Probe) as a single-backward metric that detects boundary stress in vision (high coupling = adversarial fragility) and LLMs (low coupling = hallucination risk)
  • Proposes ConjMask and LogitReg defenses for adversarial robustness without adversarial training, and uses CC-Probe for hallucination detection before token generation

🛡️ Threat Analysis

Input Manipulation Attack

Paper addresses adversarial vulnerability in vision models and proposes ConjMask and LogitReg defenses to improve robustness against adversarial perturbations without adversarial training.


Details

Domains
visionnlpmultimodal
Model Types
cnnllmtransformer
Threat Tags
inference_timedigital
Applications
image classificationtext generationhallucination detectionadversarial defense