ML Security Papers

Latest papers

8 papers

defense arXiv Feb 4, 2026 · 8w ago

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

Zhuosen Bao, Xia Du, Zheng Lin et al. · Xiamen University of Technology · University of Hong Kong +8 more

Generates unrestricted adversarial faces using diffusion models to evade facial recognition with 99% black-box success rate

Input Manipulation Attack visiongenerative

PDF

defense arXiv Feb 2, 2026 · 9w ago

MIRROR: Manifold Ideal Reference ReconstructOR for Generalizable AI-Generated Image Detection

Ruiqi Liu, Manni Cui, Ziheng Qin et al. · Institute of Automation · School of Advanced Interdisciplinary Sciences +7 more

Detects AI-generated images by projecting inputs to a real-image manifold and using reconstruction residuals as forgery signals, surpassing human experts

Output Integrity Attack visiongenerative

PDF Code

defense arXiv Jan 5, 2026 · Jan 2026

FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks

Chenyu Hu, Qiming Hu, Sinan Chen et al. · Southwest University · University of Electronic Science and Technology of China +3 more

Defends federated learning against adaptive backdoor attacks using dynamic gradient scaling and robust core-set aggregation

Model Poisoning federated-learningvision

PDF

defense arXiv Dec 24, 2025 · Dec 2025

Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection

Ruiqi Liu, Yi Han, Zhengbo Zhang et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +5 more

Detects AI-generated images by modeling real image manifolds rather than generator artifacts, robust to real-world degradation chains

Output Integrity Attack visiongenerative

1 citations PDF

attack arXiv Nov 13, 2025 · Nov 2025

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Shuaitong Liu, Renjue Li, Lijia Yu et al. · Southwest University · Chinese Academy of Sciences +1 more

Backdoor attack poisons LLM fine-tuning to trigger 17x CoT trace inflation for stealthy compute exhaustion

Model Poisoning Model Denial of Service nlp

1 citations PDF

defense arXiv Oct 31, 2025 · Oct 2025

Lightweight CNN Model Hashing with Higher-Order Statistics and Chaotic Mapping for Piracy Detection and Tamper Localization

Kunming Yang, Ling Chen · Southwest University · State Key Laboratory of Integrated Chips and Systems

Perceptual hashing scheme for CNN model piracy detection and tamper localization using higher-order statistics and chaotic mapping

Model Theft vision

PDF

defense arXiv Oct 10, 2025 · Oct 2025

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Xiaonan Si, Meilin Zhu, Simeng Qin et al. · Institute of Software · University of Chinese Academy of Sciences +5 more

Defends RAG systems from corpus poisoning via two-stage semantic and conflict-aware filtering before LLM generation

Prompt Injection nlp

2 citations PDF

defense arXiv Sep 29, 2025 · Sep 2025

H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning

Shiyuan Zuo, Rongfei Fan, Cheng Zhan et al. · Beijing Institute of Technology · Sun Yat-Sen University +2 more

Defends federated learning against Byzantine poisoning via efficient random-segment similarity-aware aggregation, with or without clean data

Data Poisoning Attack federated-learning

PDF

Latest papers

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

MIRROR: Manifold Ideal Reference ReconstructOR for Generalizable AI-Generated Image Detection

FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks

Beyond Artifacts: Real-Centric Envelope Modeling for Reliable AI-Generated Image Detection

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Lightweight CNN Model Hashing with Higher-Order Statistics and Chaotic Mapping for Piracy Detection and Tamper Localization

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue