Zikai Zhang

Papers in Database (3)

defense arXiv Apr 1, 2026 · 5d ago

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska et al. · University of Nevada · Oak Ridge National Laboratory

Detects LLM jailbreak attacks using logit distributions over numerical tokens, achieving 22.66% ASR reduction with minimal overhead

Prompt Injection nlp
PDF
attack arXiv Sep 16, 2025 · Sep 2025

On the Out-of-Distribution Backdoor Attack for Federated Learning

Jiahao Xu, Zikai Zhang, Rui Hu · University of Nevada

Introduces OOD-data-triggered FL backdoor attack evading SOTA defenses, and BNGuard defense using batch normalization statistics to detect it

Model Poisoning federated-learningvision
PDF Code
defense arXiv Aug 5, 2025 · Aug 2025

Majority Bit-Aware Watermarking For Large Language Models

Jiahao Xu, Rui Hu, Zikai Zhang · University of Nevada

Embeds multi-bit watermarks in LLM output text via majority-bit-aware encoding to enable user-level misuse tracing with higher quality

Output Integrity Attack nlp
PDF