ML Security Papers

Latest papers

3 papers

benchmark arXiv Jan 10, 2026 · 12w ago

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Hongjun An, Yiliang Song, Jiangan Chen et al. · Northwestern Polytechnical University · China Telecom +1 more

Factorial framework diagnoses how manipulative natural-language prompts exploit RLHF alignment to make LLMs prioritize sycophancy over factual accuracy

Prompt Injection nlp

PDF

attack arXiv Nov 22, 2025 · Nov 2025

Privacy Auditing of Multi-domain Graph Pre-trained Model under Membership Inference Attacks

Jiayi Luo, Qingyun Sun, Yuecen Wei et al. · Beihang University · Guangxi Normal University

Proposes MGP-MIA, a membership inference attack on multi-domain graph pre-trained models using unlearning-based signal amplification and shadow model construction

Membership Inference Attack graph

1 citations PDF

attack arXiv Nov 22, 2025 · Nov 2025

Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models

Jiayi Luo, Qingyun Sun, Lingjuan Lyu et al. · Beihang University · Sony AI +1 more

Backdoor attack on Graph Foundation Models with label-free triggers and fine-tuning-resistant anchoring for persistence

Model Poisoning Transfer Learning Attack graph

1 citations PDF

Latest papers

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Privacy Auditing of Multi-domain Graph Pre-trained Model under Membership Inference Attacks

Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue