defense arXiv Sep 22, 2025 · Sep 2025
Duoxun Tang, Xinhang Jiang, Jiajun Niu · Tsinghua University · The Chinese University of Hong Kong
Defends text embeddings against inversion attacks using RL-learned, geometry-aware orthogonal noise with PII signal guidance
Model Inversion Attack nlp
Text embedding inversion attacks reconstruct original sentences from latent representations, posing severe privacy threats in collaborative inference and edge computing. We propose TextCrafter, an optimization-based adversarial perturbation mechanism that combines RL learned, geometry aware noise injection orthogonal to user embeddings with cluster priors and PII signal guidance to suppress inversion while preserving task utility. Unlike prior defenses either non learnable or agnostic to perturbation direction, TextCrafter provides a directional protective policy that balances privacy and utility. Under strong privacy setting, TextCrafter maintains 70 percentage classification accuracy on four datasets and consistently outperforms Gaussian/LDP baselines across lower privacy budgets, demonstrating a superior privacy utility trade off.
llm transformer Tsinghua University · The Chinese University of Hong Kong
defense arXiv Jan 5, 2026 · Jan 2026
Duoxun Tang, Xueyi Zhang, Chak Hin Wang et al. · Tsinghua University · The Chinese University of Hong Kong +2 more
Defends video recognition models against PGD and CW attacks via flow-matching purification with masking and frequency-gated loss
Input Manipulation Attack vision
Video recognition models remain vulnerable to adversarial attacks, while existing diffusion-based purification methods suffer from inefficient sampling and curved trajectories. Directly regressing clean videos from adversarial inputs often fails to recover faithful content due to the subtle nature of perturbations; this necessitates physically shattering the adversarial structure. Therefore, we propose Flow Matching for Adversarial Video Purification FMVP. FMVP physically shatters global adversarial structures via a masking strategy and reconstructs clean video dynamics using Conditional Flow Matching (CFM) with an inpainting objective. To further decouple semantic content from adversarial noise, we design a Frequency-Gated Loss (FGL) that explicitly suppresses high-frequency adversarial residuals while preserving low-frequency fidelity. We design Attack-Aware and Generalist training paradigms to handle known and unknown threats, respectively. Extensive experiments on UCF-101 and HMDB-51 demonstrate that FMVP outperforms state-of-the-art methods (DiffPure, Defense Patterns (DP), Temporal Shuffling (TS) and FlowPure), achieving robust accuracy exceeding 87% against PGD and 89% against CW attacks. Furthermore, FMVP demonstrates superior robustness against adaptive attacks (DiffHammer) and functions as a zero-shot adversarial detector, attaining AUC-ROC scores of 0.98 for PGD and 0.79 for highly imperceptible CW attacks.
cnn transformer Tsinghua University · The Chinese University of Hong Kong · University of New South Wales +1 more