Key Principles of Graph Machine Learning: Representation, Robustness, and Generalization
Yassine Abbahaddou, Céline Hudelot, Charlotte Laclau et al. · École Polytechnique · CentraleSupélec +4 more
Yassine Abbahaddou, Céline Hudelot, Charlotte Laclau et al. · École Polytechnique · CentraleSupélec +4 more
Defends GNNs against adversarial graph perturbations via orthonormalization and noise-based techniques, alongside representation and generalization contributions
Graph Neural Networks (GNNs) have emerged as powerful tools for learning representations from structured data. Despite their growing popularity and success across various applications, GNNs encounter several challenges that limit their performance. in their generalization, robustness to adversarial perturbations, and the effectiveness of their representation learning capabilities. In this dissertation, I investigate these core aspects through three main contributions: (1) developing new representation learning techniques based on Graph Shift Operators (GSOs, aiming for enhanced performance across various contexts and applications, (2) introducing generalization-enhancing methods through graph data augmentation, and (3) developing more robust GNNs by leveraging orthonormalization techniques and noise-based defenses against adversarial attacks. By addressing these challenges, my work provides a more principled understanding of the limitations and potential of GNNs.
Lorenzo Casalino, Maria Méndez Real, Jean-Christophe Prévotet et al. · CentraleSupélec · INRIA +7 more
Side-channel attack breaks MACPRUNING defense to recover 96–100% of DNN weights from embedded hardware implementations
Deep neural networks (DNNs), which support services such as driving assistants and medical diagnoses, undergo lengthy and expensive training procedures. Therefore, the training's outcome - the DNN weights - represents a significant intellectual property asset to protect. Side-channel analysis (SCA) has recently appeared as an effective approach to recover this confidential asset from DNN implementations. In response, researchers have proposed to defend DNN implementations through classic side-channel countermeasures, at the cost of higher energy consumption, inference time, and resource utilisation. Following a different approach, Ding et al. (HOST'25) introduced MACPRUNING, a novel SCA countermeasure based on pruning, a performance-oriented Approximate Computing technique: at inference time, the implementation randomly prunes (or skips) non-important weights (i.e., with low contribution to the DNN's accuracy) of the first layer, exponentially increasing the side-channel resilience of the protected DNN implementation. However, the original security analysis of MACPRUNING did not consider a control-flow dependency intrinsic to the countermeasure design. This dependency may allow an attacker to circumvent MACPRUNING and recover the weights important to the DNN's accuracy. This paper describes a preprocessing methodology to exploit the above-mentioned control-flow dependency. Through practical experiments on a Chipwhisperer-Lite running a MACPRUNING-protected Multi-Layer Perceptron, we target the first 8 weights of each neuron and recover 96% of the important weights, demonstrating the drastic reduction in security of the protected implementation. Moreover, we show how microarchitectural leakage improves the effectiveness of our methodology, even allowing for the recovery of up to 100% of the targeted non-important weights. Lastly, by adapting our methodology [continue in pdf].
Matthieu Dubois, François Yvon, Pablo Piantanida · Sorbonne Université · CNRS +2 more
Benchmarks AI text detectors across 37 decoding configs, showing AUROC collapses from 0.99 to 0.01 with minor sampling changes
As texts generated by Large Language Models (LLMs) are ever more common and often indistinguishable from human-written content, research on automatic text detection has attracted growing attention. Many recent detectors report near-perfect accuracy, often boasting AUROC scores above 99\%. However, these claims typically assume fixed generation settings, leaving open the question of how robust such systems are to changes in decoding strategies. In this work, we systematically examine how sampling-based decoding impacts detectability, with a focus on how subtle variations in a model's (sub)word-level distribution affect detection performance. We find that even minor adjustments to decoding parameters - such as temperature, top-p, or nucleus sampling - can severely impair detector accuracy, with AUROC dropping from near-perfect levels to 1\% in some settings. Our findings expose critical blind spots in current detection methods and emphasize the need for more comprehensive evaluation protocols. To facilitate future research, we release a large-scale dataset encompassing 37 decoding configurations, along with our code and evaluation framework https://github.com/BaggerOfWords/Sampling-and-Detection
Darpan Aswal, Céline Hudelot · Université Paris-Saclay · CentraleSupélec
Defends LLMs against jailbreaks by using sparse autoencoders to identify interpretable internal activation concepts linked to attack themes
Large Language Models have found success in a variety of applications. However, their safety remains a concern due to the existence of various jailbreaking methods. Despite significant efforts, alignment and safety fine-tuning only provide a certain degree of robustness against jailbreak attacks that covertly mislead LLMs towards the generation of harmful content. This leaves them prone to a range of vulnerabilities, including targeted misuse and accidental user profiling. This work introduces \textbf{ConceptGuard}, a novel framework that leverages Sparse Autoencoders (SAEs) to identify interpretable concepts within LLM internals associated with different jailbreak themes. By extracting semantically meaningful internal representations, ConceptGuard enables building robust safety guardrails -- offering fully explainable and generalizable defenses without sacrificing model capabilities or requiring further fine-tuning. Leveraging advances in the mechanistic interpretability of LLMs, our approach provides evidence for a shared activation geometry for jailbreak attacks in the representation space, a potential foundation for designing more interpretable and generalizable safeguards against attackers.
Pierre-Francois Gimenez, Sarath Sivaprasad, Mario Fritz · Univ Rennes · INRIA +3 more
Certifiably robust malware detection architecture proving every robust detector decomposes into a specific structure resistant to evasion attacks
Malware analysis involves analyzing suspicious software to detect malicious payloads. Static malware analysis, which does not require software execution, relies increasingly on machine learning techniques to achieve scalability. Although such techniques obtain very high detection accuracy, they can be easily evaded with adversarial examples where a few modifications of the sample can dupe the detector without modifying the behavior of the software. Unlike other domains, such as computer vision, creating an adversarial example of malware without altering its functionality requires specific transformations. We propose a new model architecture for certifiably robust malware detection by design. In addition, we show that every robust detector can be decomposed into a specific structure, which can be applied to learn empirically robust malware detectors, even on fragile features. Our framework ERDALT is based on this structure. We compare and validate these approaches with machine-learning-based malware detection methods, allowing for robust detection with limited reduction of detection performance.