The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks
Zhou Feng 1, Jiahao Chen 1, Chunyi Zhou 1, Yuwen Pu 2, Tianyu Du 1, Jinbao Li 3, Jianhai Chen 1, Shouling Ji 1
Published on arXiv
2512.10402
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
Eminence achieves ≥90% attack success rate with ≤0.01% poison rate, compared to SOTA methods requiring ≥1%, confirming an exponential relationship between margin poisoning and adversarial boundary manipulation.
Eminence
Novel technique introduced
Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks, typically reliant on heuristic brute-force methods. Despite significant empirical advancements in backdoor research, the lack of rigorous theoretical analysis limits understanding of underlying mechanisms, constraining attack predictability and adaptability. Therefore, we provide a theoretical analysis targeting backdoor attacks, focusing on how sparse decision boundaries enable disproportionate model manipulation. Based on this finding, we derive a closed-form, ambiguous boundary region, wherein negligible relabeled samples induce substantial misclassification. Influence function analysis further quantifies significant parameter shifts caused by these margin samples, with minimal impact on clean accuracy, formally grounding why such low poison rates suffice for efficacious attacks. Leveraging these insights, we propose Eminence, an explainable and robust black-box backdoor framework with provable theoretical guarantees and inherent stealth properties. Eminence optimizes a universal, visually subtle trigger that strategically exploits vulnerable decision boundaries and effectively achieves robust misclassification with exceptionally low poison rates (< 0.1%, compared to SOTA methods typically requiring > 1%). Comprehensive experiments validate our theoretical discussions and demonstrate the effectiveness of Eminence, confirming an exponential relationship between margin poisoning and adversarial boundary manipulation. Eminence maintains > 90% attack success rate, exhibits negligible clean-accuracy loss, and demonstrates high transferability across diverse models, datasets and scenarios.
Key Contributions
- Theoretical analysis deriving a closed-form 'ambiguous boundary region' showing how sparse decision boundaries allow negligible relabeled samples to induce substantial misclassification
- Influence function analysis formally quantifying why extremely low poison rates (<0.01%) are sufficient for effective backdoor attacks with minimal clean-accuracy degradation
- Eminence: a black-box backdoor framework with provable guarantees that optimizes universal, visually subtle triggers achieving ≥90% ASR — 100x lower poison rate than SOTA methods
🛡️ Threat Analysis
Eminence is a backdoor/trojan attack that embeds hidden, targeted misclassification behavior activated by a visually subtle trigger, using influence functions and boundary analysis to inject backdoors at exceptionally low poison rates (<0.01%) while preserving clean accuracy.