Verification of Neural Networks (Lecture Notes)
Benedikt Bollig · Université Paris-Saclay · CNRS +1 more
Benedikt Bollig · Université Paris-Saclay · CNRS +1 more
Theoretical introduction to formal verification techniques for neural networks including feed-forward, recurrent, attention, and transformer architectures
These lecture notes provide an introduction to the verification of neural networks from a theoretical perspective. We discuss feed-forward neural networks, recurrent neural networks, attention mechanisms, and transformers, together with specification languages and algorithmic verification techniques.
Francesco Diana, Chuan Xu, André Nusser et al. · Université Côte d’Azur · Inria +2 more
Gradient inversion attack on federated learning that algebraically verifies when exact training records are reconstructed from gradients
Gradient inversion attacks threaten client privacy in federated learning by reconstructing training samples from clients' shared gradients. Gradients aggregate contributions from multiple records and existing attacks may fail to disentangle them, yielding incorrect reconstructions with no intrinsic way to certify success. In vision and language, attackers may fall back on human inspection to judge reconstruction plausibility, but this is far less feasible for numerical tabular records, fueling the impression that tabular data is less vulnerable. We challenge this perception by proposing a verifiable gradient inversion attack (VGIA) that provides an explicit certificate of correctness for reconstructed samples. Our method adopts a geometric view of ReLU leakage: the activation boundary of a fully connected layer defines a hyperplane in input space. VGIA introduces an algebraic, subspace-based verification test that detects when a hyperplane-delimited region contains exactly one record. Once isolation is certified, VGIA recovers the corresponding feature vector analytically and reconstructs the target via a lightweight optimization step. Experiments on tabular benchmarks with large batch sizes demonstrate exact record and target recovery in regimes where existing state-of-the-art attacks either fail or cannot assess reconstruction fidelity. Compared to prior geometric approaches, VGIA allocates hyperplane queries more effectively, yielding faster reconstructions with fewer attack rounds.
Thomas Michel, Debabrota Basu, Emilie Kaufmann · Univ. Lille · Inria +2 more
Derives optimal membership inference attack exploiting model update sequences, achieving tighter DP privacy audits than static-model baselines
Modern AI models are not static. They go through multiple updates in their lifecycles. Thus, exploiting the model dynamics to create stronger Membership Inference (MI) attacks and tighter privacy audits are timely questions. Though the literature empirically shows that using a sequence of model updates can increase the power of MI attacks, rigorous analysis of the `optimal' MI attacks is limited to static models with infinite samples. Hence, we develop an `optimal' MI attack, SeMI*, that uses the sequence of model updates to identify the presence of a target inserted at a certain update step. For the empirical mean computation, we derive the optimal power of SeMI*, while accessing a finite number of samples with or without privacy. Our results retrieve the existing asymptotic analysis. We observe that having access to the model sequence avoids the dilution of MI signals unlike the existing attacks on the final model, where the MI signal vanishes as training data accumulates. Furthermore, an adversary can use SeMI* to tune both the insertion time and the canary to yield tighter privacy audits. Finally, we conduct experiments across data distributions and models trained or fine-tuned with DP-SGD demonstrating that practical variants of SeMI* lead to tighter privacy audits than the baselines.
Valentin Dorseuil, Jamal Atif, Olivier Cappé · École normale supérieure · Université PSL +3 more
Proposes Generalized Leverage Score as a training-free metric for individual membership inference vulnerability in deep learning
Can the privacy vulnerability of individual data points be assessed without retraining models or explicitly simulating attacks? We answer affirmatively by showing that exposure to membership inference attack (MIA) is fundamentally governed by a data point's influence on the learned model. We formalize this in the linear setting by establishing a theoretical correspondence between individual MIA risk and the leverage score, identifying it as a principled metric for vulnerability. This characterization explains how data-dependent sensitivity translates into exposure, without the computational burden of training shadow models. Building on this, we propose a computationally efficient generalization of the leverage score for deep learning. Empirical evaluations confirm a strong correlation between the proposed score and MIA success, validating this metric as a practical surrogate for individual privacy risk assessment.
Lorenzo Casalino, Maria Méndez Real, Jean-Christophe Prévotet et al. · CentraleSupélec · Inria +7 more
Side-channel attack breaks MACPRUNING defense to recover 96–100% of DNN weights from embedded hardware implementations
Deep neural networks (DNNs), which support services such as driving assistants and medical diagnoses, undergo lengthy and expensive training procedures. Therefore, the training's outcome - the DNN weights - represents a significant intellectual property asset to protect. Side-channel analysis (SCA) has recently appeared as an effective approach to recover this confidential asset from DNN implementations. In response, researchers have proposed to defend DNN implementations through classic side-channel countermeasures, at the cost of higher energy consumption, inference time, and resource utilisation. Following a different approach, Ding et al. (HOST'25) introduced MACPRUNING, a novel SCA countermeasure based on pruning, a performance-oriented Approximate Computing technique: at inference time, the implementation randomly prunes (or skips) non-important weights (i.e., with low contribution to the DNN's accuracy) of the first layer, exponentially increasing the side-channel resilience of the protected DNN implementation. However, the original security analysis of MACPRUNING did not consider a control-flow dependency intrinsic to the countermeasure design. This dependency may allow an attacker to circumvent MACPRUNING and recover the weights important to the DNN's accuracy. This paper describes a preprocessing methodology to exploit the above-mentioned control-flow dependency. Through practical experiments on a Chipwhisperer-Lite running a MACPRUNING-protected Multi-Layer Perceptron, we target the first 8 weights of each neuron and recover 96% of the important weights, demonstrating the drastic reduction in security of the protected implementation. Moreover, we show how microarchitectural leakage improves the effectiveness of our methodology, even allowing for the recovery of up to 100% of the targeted non-important weights. Lastly, by adapting our methodology [continue in pdf].
Mohammad Afzal, S. Akshay, Blaise Genest et al. · Indian Institute of Technology Bombay · TCS Research +1 more
Formal verification framework extending neural network robustness checking to confidence-based specifications via grammar and layer augmentation
In the last decade, a large body of work has emerged on robustness of neural networks, i.e., checking if the decision remains unchanged when the input is slightly perturbed. However, most of these approaches ignore the confidence of a neural network on its output. In this work, we aim to develop a generalized framework for formally reasoning about the confidence along with robustness in neural networks. We propose a simple yet expressive grammar that captures various confidence-based specifications. We develop a novel and unified technique to verify all instances of the grammar in a homogeneous way, viz., by adding a few additional layers to the neural network, which enables the use any state-of-the-art neural network verification tool. We perform an extensive experimental evaluation over a large suite of 8870 benchmarks, where the largest network has 138M parameters, and show that this outperforms ad-hoc encoding approaches by a significant margin.
Mingmeng Geng, Thierry Poibeau · École normale supérieure · Université Paris Sciences et Lettres +1 more
Surveys LLM text detection limitations, arguing reliable detection fails due to definitional gaps and benchmark inadequacies
With the widespread use of large language models (LLMs), many researchers have turned their attention to detecting text generated by them. However, there is no consistent or precise definition of their target, namely "LLM-generated text". Differences in usage scenarios and the diversity of LLMs further increase the difficulty of detection. What is commonly regarded as the detecting target usually represents only a subset of the text that LLMs can potentially produce. Human edits to LLM outputs, together with the subtle influences that LLMs exert on their users, are blurring the line between LLM-generated and human-written text. Existing benchmarks and evaluation approaches do not adequately address the various conditions in real-world detector applications. Hence, the numerical results of detectors are often misunderstood, and their significance is diminishing. Therefore, detectors remain useful under specific conditions, but their results should be interpreted only as references rather than decisive indicators.
Matthieu Dubois, François Yvon, Pablo Piantanida · Sorbonne Université · CNRS +2 more
Benchmarks AI text detectors across 37 decoding configs, showing AUROC collapses from 0.99 to 0.01 with minor sampling changes
As texts generated by Large Language Models (LLMs) are ever more common and often indistinguishable from human-written content, research on automatic text detection has attracted growing attention. Many recent detectors report near-perfect accuracy, often boasting AUROC scores above 99\%. However, these claims typically assume fixed generation settings, leaving open the question of how robust such systems are to changes in decoding strategies. In this work, we systematically examine how sampling-based decoding impacts detectability, with a focus on how subtle variations in a model's (sub)word-level distribution affect detection performance. We find that even minor adjustments to decoding parameters - such as temperature, top-p, or nucleus sampling - can severely impair detector accuracy, with AUROC dropping from near-perfect levels to 1\% in some settings. Our findings expose critical blind spots in current detection methods and emphasize the need for more comprehensive evaluation protocols. To facilitate future research, we release a large-scale dataset encompassing 37 decoding configurations, along with our code and evaluation framework https://github.com/BaggerOfWords/Sampling-and-Detection
Enoal Gesny, Eva Giboulot, Teddy Furon et al. · Univ. Rennes · Inria +3 more
Guides diffusion sampling with watermark-decoder gradients to embed robust provenance signals in generated images without retraining
This paper introduces a novel watermarking method for diffusion models. It is based on guiding the diffusion process using the gradient computed from any off-the-shelf watermark decoder. The gradient computation encompasses different image augmentations, increasing robustness to attacks against which the decoder was not originally robust, without retraining or fine-tuning. Our method effectively convert any \textit{post-hoc} watermarking scheme into an in-generation embedding along the diffusion process. We show that this approach is complementary to watermarking techniques modifying the variational autoencoder at the end of the diffusion process. We validate the methods on different diffusion models and detectors. The watermarking guidance does not significantly alter the generated image for a given seed and prompt, preserving both the diversity and quality of generation.
Sayan Biswas, Philippe Chartier, Akash Dhasade et al. · EPFL · Inria +4 more
Defends model IP in hybrid FHE inference by randomized shuffling of intermediate outputs, preventing clients from reconstructing server-side model weights
In contemporary cloud-based services, protecting users' sensitive data and ensuring the confidentiality of the server's model are critical. Fully homomorphic encryption (FHE) enables inference directly on encrypted inputs, but its practicality is hindered by expensive bootstrapping and inefficient approximations of non-linear activations. We introduce Safhire, a hybrid inference framework that executes linear layers under encryption on the server while offloading non-linearities to the client in plaintext. This design eliminates bootstrapping, supports exact activations, and significantly reduces computation. To safeguard model confidentiality despite client access to intermediate outputs, Safhire applies randomized shuffling, which obfuscates intermediate values and makes it practically impossible to reconstruct the model. To further reduce latency, Safhire incorporates advanced optimizations such as fast ciphertext packing and partial extraction. Evaluations on multiple standard models and datasets show that Safhire achieves 1.5X - 10.5X lower inference latency than Orion, a state-of-the-art baseline, with manageable communication overhead and comparable accuracy, thereby establishing the practicality of hybrid FHE inference.
Katarzyna Filus, Jorge M. Cruz-Duarte · Polish Academy of Sciences · University of Lille +3 more
Semantically guided target label selection using BERT/CLIP/TinyLLAMA improves adversarial benchmarking interpretability and scalability over WordNet
In targeted adversarial attacks on vision models, the selection of the target label is a critical yet often overlooked determinant of attack success. This target label corresponds to the class that the attacker aims to force the model to predict. Now, existing strategies typically rely on randomness, model predictions, or static semantic resources, limiting interpretability, reproducibility, or flexibility. This paper then proposes a semantics-guided framework for adversarial target selection using the cross-modal knowledge transfer from pretrained language and vision-language models. We evaluate several state-of-the-art models (BERT, TinyLLAMA, and CLIP) as similarity sources to select the most and least semantically related labels with respect to the ground truth, forming best- and worst-case adversarial scenarios. Our experiments on three vision models and five attack methods reveal that these models consistently render practical adversarial targets and surpass static lexical databases, such as WordNet, particularly for distant class relationships. We also observe that static testing of target labels offers a preliminary assessment of the effectiveness of similarity sources, \textit{a priori} testing. Our results corroborate the suitability of pretrained models for constructing interpretable, standardized, and scalable adversarial benchmarks across architectures and datasets.
Quentin Le Roux, Yannick Teglia, Teddy Furon et al. · Thales · Inria +3 more
Novel backdoor attacks on face detectors shift landmark coordinates and generate phantom faces via poisoned training data
Face Recognition Systems that operate in unconstrained environments capture images under varying conditions,such as inconsistent lighting, or diverse face poses. These challenges require including a Face Detection module that regresses bounding boxes and landmark coordinates for proper Face Alignment. This paper shows the effectiveness of Object Generation Attacks on Face Detection, dubbed Face Generation Attacks, and demonstrates for the first time a Landmark Shift Attack that backdoors the coordinate regression task performed by face detectors. We then offer mitigations against these vulnerabilities.