ML Security Papers

Latest papers

6 papers

defense arXiv Mar 24, 2026 · 13d ago

SafeSeek: Universal Attribution of Safety Circuits in Language Models

Miao Yu, Siyuan Fu, Moayad Aloqaily et al. · University of Science and Technology of China · Squirrel AI Learning +4 more

Mechanistic interpretability framework identifying sparse safety circuits in LLMs for backdoor removal and alignment preservation

Model Poisoning Input Manipulation Attack Prompt Injection nlp

PDF

defense arXiv Mar 18, 2026 · 19d ago

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

Yuxin Liu, Fei Wang, Kun Li et al. · AnHui University · Hefei Comprehensive National Science Center +2 more

Training-free deepfake detection using LVLMs that mines suspicious patch tokens via semantic clustering and frequency-noise anomaly scoring

Output Integrity Attack visionmultimodal

PDF

benchmark arXiv Jan 26, 2026 · 10w ago

$α^3$-SecBench: A Large-Scale Evaluation Suite of Security, Resilience, and Trust for LLM-based UAV Agents over 6G Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah · United Arab Emirates University · Khalifa University of Science and Technology

Benchmarks 23 LLMs as UAV agents across 175 adversarial threat types, exposing gaps in attack mitigation and policy-compliant tool use

Prompt Injection Excessive Agency multimodalreinforcement-learningnlp

1 citations PDF Code

defense arXiv Oct 11, 2025 · Oct 2025

Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models

Liang Lin, Miao Yu, Moayad Aloqaily et al. · Nanyang Technological University · University of Science and Technology of China +4 more

Defends LLMs against unknown backdoors by intentionally injecting known triggers to aggregate and then purge backdoor representations

Model Poisoning nlp

PDF

defense arXiv Sep 26, 2025 · Sep 2025

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Miao Yu, Zhenhong Zhou, Moayad Aloqaily et al. · University of Science and Technology of China · Nanyang Technological University +5 more

Mechanistic interpretability framework identifies backdoor-responsible attention heads in LLMs, enabling surgical neutralization or amplification of backdoor behavior

Model Poisoning nlp

1 citations PDF

survey arXiv Jan 2, 2025 · Jan 2025

State-of-the-art AI-based Learning Approaches for Deepfake Generation and Detection, Analyzing Opportunities, Threading through Pros, Cons, and Future Prospects

Harshika Goyal, Mohammad Saif Wajid, Mohd Anas Wajid et al. · Indian Institute of Technology · Tecnológico de Monterrey +6 more

Surveys ~400 papers on deepfake generation (GANs, VAEs, Transformers) and detection, benchmarking datasets and future challenges

Output Integrity Attack visiongenerative

5 citations PDF

The rapid advancement of deepfake technologies, specifically designed to create incredibly lifelike facial imagery and video content, has ignited a remarkable level of interest and curiosity across many fields, including forensic analysis, cybersecurity and the innovative creation of digital characters. By harnessing the latest breakthroughs in deep learning methods, such as Generative Adversarial Networks, Variational Autoencoders, Few-Shot Learning Strategies, and Transformers, the outcomes achieved in generating deepfakes have been nothing short of astounding and transformative. Also, the ongoing evolution of detection technologies is being developed to counteract the potential for misuse associated with deepfakes, effectively addressing critical concerns that range from political manipulation to the dissemination of fake news and the ever-growing issue of cyberbullying. This comprehensive review paper meticulously investigates the most recent developments in deepfake generation and detection, including around 400 publications, providing an in-depth analysis of the cutting-edge innovations shaping this rapidly evolving landscape. Starting with a thorough examination of systematic literature review methodologies, we embark on a journey that delves into the complex technical intricacies inherent in the various techniques used for deepfake generation, comprehensively addressing the challenges faced, potential solutions available, and the nuanced details surrounding manipulation formulations. Subsequently, the paper is dedicated to accurately benchmarking leading approaches against prominent datasets, offering thorough assessments of the contributions that have significantly impacted these vital domains. Ultimately, we engage in a thoughtful discussion of the existing challenges, paving the way for continuous advancements in this critical and ever-dynamic study area.

gan diffusion transformer rnn Indian Institute of Technology · Tecnológico de Monterrey · TEC de Monterrey +5 more

PDF arXiv DOI

Latest papers

SafeSeek: Universal Attribution of Safety Circuits in Language Models

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

$α^3$-SecBench: A Large-Scale Evaluation Suite of Security, Resilience, and Trust for LLM-based UAV Agents over 6G Networks

Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

State-of-the-art AI-based Learning Approaches for Deepfake Generation and Detection, Analyzing Opportunities, Threading through Pros, Cons, and Future Prospects

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue