Latest papers

3 papers
benchmark arXiv Jan 10, 2026 · 12w ago

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Hongjun An, Yiliang Song, Jiangan Chen et al. · Northwestern Polytechnical University · China Telecom +1 more

Factorial framework diagnoses how manipulative natural-language prompts exploit RLHF alignment to make LLMs prioritize sycophancy over factual accuracy

Prompt Injection nlp
PDF
attack arXiv Nov 22, 2025 · Nov 2025

Privacy Auditing of Multi-domain Graph Pre-trained Model under Membership Inference Attacks

Jiayi Luo, Qingyun Sun, Yuecen Wei et al. · Beihang University · Guangxi Normal University

Proposes MGP-MIA, a membership inference attack on multi-domain graph pre-trained models using unlearning-based signal amplification and shadow model construction

Membership Inference Attack graph
1 citations PDF
attack arXiv Nov 22, 2025 · Nov 2025

Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models

Jiayi Luo, Qingyun Sun, Lingjuan Lyu et al. · Beihang University · Sony AI +1 more

Backdoor attack on Graph Foundation Models with label-free triggers and fine-tuning-resistant anchoring for persistence

Model Poisoning Transfer Learning Attack graph
1 citations PDF