Lingxiang Wang

Papers in Database (1)

attack arXiv Mar 5, 2026 · 4w ago

From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models

Ruiqi Zhang, Lingxiang Wang, Hainan Zhang et al. · Beihang University · Tsinghua University

Detects LLM pre-training data via gradient deviation scores capturing update magnitude, location, and concentration in FFN/Attention modules

Membership Inference Attack nlp
PDF