Latest papers

5 papers
defense arXiv Feb 14, 2026 · 7w ago

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Weiming Song, Xuan Xie, Ruiping Yin · Beijing University of Technology · Macau University of Science and Technology

Defends LLMs against jailbreaks by extracting safety signals from attention heads and steering logits without fine-tuning

Prompt Injection nlp
PDF
defense arXiv Jan 1, 2026 · Jan 2026

Making Theft Useless: Adulteration-Based Protection of Proprietary Knowledge Graphs in GraphRAG Systems

Weijie Wang, Peizhuo Lv, Yan Wang et al. · Chinese Academy of Sciences · National University of Singapore +2 more

Injects false 'adulterant' facts into proprietary Knowledge Graphs to render stolen copies unusable in competing GraphRAG deployments

Model Theft nlpgraph
PDF
benchmark arXiv Dec 11, 2025 · Dec 2025

RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection

Zhuo Wang, Xiliang Liu, Ligang Sun · Beijing University of Technology

Benchmarks AI-generated video detectors' robustness to watermark removal and spoofing attacks across ten models and 6,500 videos

Output Integrity Attack vision
1 citations PDF
defense arXiv Sep 19, 2025 · Sep 2025

TrueMoE: Dual-Routing Mixture of Discriminative Experts for Synthetic Image Detection

Laixin Zhang, Shuaibo Li, Wei Ma et al. · Beijing University of Technology · The Hong Kong University of Science and Technology (Guangzhou) +1 more

Novel Mixture-of-Experts framework for synthetic image detection using dual-routing across manifold and granularity expert subspaces

Output Integrity Attack visiongenerative
PDF
survey Journal of Network and Compute... Jan 1, 2025 · Jan 2025

A Survey of Secure Semantic Communications

Rui Meng, Song Gao, Dayu Fan et al. · Beijing University of Posts and Telecommunications · Peng Cheng Laboratory +1 more

Surveys ML security threats and defenses across AI-based semantic communication system lifecycle for 6G networks

Input Manipulation Attack Data Poisoning Attack Model Poisoning nlpvisionmultimodal
27 citations PDF