Wanli Peng

defense AAAI Aug 3, 2025 · Aug 2025

BeDKD: Backdoor Defense Based on Directional Mapping Module and Adversarial Knowledge Distillation

Zhengxian Wu, Juan Wen, Wanli Peng et al. · China Agricultural University

Defends NLP text classifiers against backdoor attacks using directional mapping to identify poisoned data and adversarial knowledge distillation to erase trigger behavior

Model Poisoning nlp

PDF Code

defense arXiv Aug 8, 2025 · Aug 2025

SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs

Zhengxian Wu, Juan Wen, Wanli Peng et al. · China Agricultural University

Defends LLM APIs against instruction backdoors by extracting task-relevant key phrases and filtering trigger-induced anomalous semantic scores

Model Poisoning nlp

PDF Code

defense arXiv Aug 12, 2025 · Aug 2025

EditMF: Drawing an Invisible Fingerprint for Your Large Language Models

Jiaxuan Wu, Yinghan Zhou, Wanli Peng et al. · China Agricultural University

Embeds ownership fingerprints into LLM weights via causal tracing and zero-space edits, verified with a single black-box query

Model Theft Model Theft nlp

PDF

attack arXiv Aug 20, 2025 · Aug 2025

Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion

Yinghan Zhou, Juan Wen, Wanli Peng et al. · China Agricultural University

Prompt-guided attack disguises LLM outputs to evade AI-generated text detectors while maintaining text quality.

Output Integrity Attack nlp

PDF Code

Papers in Database (4)

BeDKD: Backdoor Defense Based on Directional Mapping Module and Adversarial Knowledge Distillation

SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs

EditMF: Drawing an Invisible Fingerprint for Your Large Language Models

Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion