Latest papers

9 papers
defense arXiv Jan 12, 2026 · 12w ago

A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model

Qi Zheng, Shuliang Liu, Yu Huang et al. · The Hong Kong University of Science and Technology (Guangzhou) · The Hong Kong University of Science and Technology +1 more

Watermarks VLM-generated text via visual-evidence-guided token partitioning, improving visual fidelity while maintaining 96.88% AUC detection accuracy

Output Integrity Attack nlpmultimodal
PDF
defense arXiv Jan 8, 2026 · 12w ago

Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Large Reasoning Models

Shuliang Liu, Xingyu Li, Hongyi Liu et al. · The Hong Kong University of Science and Technology (Guangzhou) · The Hong Kong University of Science and Technology +1 more

Watermarks reasoning LLM text outputs by separating thinking from answering and adapting strength via semantic vectors

Output Integrity Attack nlp
1 citations PDF Code
defense arXiv Nov 26, 2025 · Nov 2025

Multimodal Robust Prompt Distillation for 3D Point Cloud Models

Xiang Gu, Liming Lu, Xu Zheng et al. · Nanjing University of Science and Technology · The Hong Kong University of Science and Technology (Guangzhou) +3 more

Defends 3D point cloud models against adversarial attacks via multimodal teacher-student prompt distillation with zero inference overhead

Input Manipulation Attack visionmultimodal
PDF Code
benchmark arXiv Oct 26, 2025 · Oct 2025

OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models

Hao Zheng, Zirui Pang, Ling li et al. · Harbin Institute of Technology · University of Illinois Urbana-Champaign +5 more

Benchmarks MLLM unlearning and reveals all methods leak supposedly-erased misinformation via adversarial recovery and prompt attacks

Sensitive Information Disclosure Prompt Injection multimodalnlpvision
PDF Code
defense arXiv Oct 17, 2025 · Oct 2025

Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks

Yuyuan Feng, Bin Ma, Enyan Dai · Xiamen University · The Hong Kong University of Science and Technology (Guangzhou)

Mixture-of-Experts GNN framework that simultaneously defends against backdoor, edge manipulation, and node injection attacks via diversity loss and robustness-aware routing

Model Poisoning Input Manipulation Attack graph
PDF Code
defense arXiv Sep 27, 2025 · Sep 2025

CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models

Yu Zhang, Shuliang Liu, Xu Yang et al. · The Hong Kong University of Science and Technology (Guangzhou) · South China University of Technology

Proposes dynamic LLM text watermarking using context-aware entropy thresholds to preserve quality across mixed-modality generation tasks

Output Integrity Attack nlp
1 citations PDF
defense arXiv Sep 19, 2025 · Sep 2025

Toward Medical Deepfake Detection: A Comprehensive Dataset and Novel Method

Shuaibo Li, Zhaohu Xing, Hongqiu Wang et al. · The Hong Kong University of Science and Technology (Guangzhou) · The Hong Kong University of Science and Technology

Novel dual-stage vision-language detector and benchmark dataset for identifying AI-generated fake medical images across six modalities

Output Integrity Attack vision
PDF
defense arXiv Sep 19, 2025 · Sep 2025

TrueMoE: Dual-Routing Mixture of Discriminative Experts for Synthetic Image Detection

Laixin Zhang, Shuaibo Li, Wei Ma et al. · Beijing University of Technology · The Hong Kong University of Science and Technology (Guangzhou) +1 more

Novel Mixture-of-Experts framework for synthetic image detection using dual-routing across manifold and granularity expert subspaces

Output Integrity Attack visiongenerative
PDF
defense arXiv Aug 3, 2025 · Aug 2025

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging

Xin He, Junxi Shen, Zhenheng Tang et al. · A*STAR · Hong Kong University of Science and Technology +2 more

Fingerprints expert modules in merged MoE models via routing behaviors to detect unauthorized IP reuse under tampering.

Model Theft visionmultimodal
PDF