Guangsheng Bao

defense arXiv Jan 8, 2026 · 12w ago

Ke Sun, Guangsheng Bao, Han Cui et al. · Westlake University

Detects AI-generated text via late-stage token probability stabilization, achieving SOTA on EvoBench and MAGE benchmarks

Output Integrity Attack nlp

1 citations PDF

defense arXiv Feb 1, 2026 · 9w ago

Ke Sun, Guangsheng Bao, Han Cui et al. · Westlake University

Prototype-based routing framework dynamically selects the best surrogate model to detect LLM-generated text across unknown black-box sources

Output Integrity Attack nlp

attack arXiv Feb 12, 2026 · 7w ago

Hongbo Zhang, Yang Yue, Jianhao Yan et al. · Zhejiang University · Westlake University +1 more

Black-box membership inference attack on RLVR-trained reasoning models exploiting generation diversity collapse to detect training data

Membership Inference Attack nlpreinforcement-learning

Papers in Database (3)