Certified but Fooled! Breaking Certified Defences with Ghost Certificates
Quoc Viet Vo 1, Tashreque M. Haq 1, Paul Montague 2, Tamas Abraham 2, Ehsan Abbasnejad 3, Damith C. Ranasinghe 1
Published on arXiv
2511.14003
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
GhostCert produces imperceptible adversarial examples with significantly lower ℓ₂ norms than the prior Shadow attack while achieving spoofed certification radii that exceed the source class certificate radius, successfully bypassing DensePure, Smoothed Ensemble, and Randomized Smoothing on ImageNet
GhostCert
Novel technique introduced
Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better understand the limits of guarantee provisions. Now, the objective is to not only mislead a classifier, but also manipulate the certification process to generate a robustness guarantee for an adversarial input certificate spoofing. A recent study in ICLR demonstrated that crafting large perturbations can shift inputs far into regions capable of generating a certificate for an incorrect class. Our study investigates if perturbations needed to cause a misclassification and yet coax a certified model into issuing a deceptive, large robustness radius for a target class can still be made small and imperceptible. We explore the idea of region-focused adversarial examples to craft imperceptible perturbations, spoof certificates and achieve certification radii larger than the source class ghost certificates. Extensive evaluations with the ImageNet demonstrate the ability to effectively bypass state-of-the-art certified defenses such as Densepure. Our work underscores the need to better understand the limits of robustness certification methods.
Key Contributions
- Region-focused adversarial perturbation algorithm (GhostCert) that constrains manipulation to salient/natural image regions, producing imperceptible adversarials that spoof robustness certificates with radii exceeding the source class
- Extension to targeted certificate spoofing attacks, demonstrating that an attacker can coax a certified model into issuing a false certificate for an attacker-chosen target class
- Rigorous evaluation against state-of-the-art certified defenses including DensePure (diffusion-based), Smoothed Ensemble, and vanilla Randomized Smoothing on ImageNet, showing GhostCert succeeds where the prior ICLR Shadow attack fails
🛡️ Threat Analysis
Proposes a new adversarial attack algorithm (GhostCert) using region-focused gradient-based perturbations to cause misclassification at inference time while also manipulating the probabilistic certification process (randomized smoothing) into issuing a false large-radius certificate — fundamentally an evasion/input manipulation attack that also circumvents the certifier as a secondary defense layer.