defense 2025

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

Jiayu Ding 1, Lei Cui 2, Li Dong 2, Nanning Zheng 1, Furu Wei 2

1 citations · 26 references · arXiv

α

Published on arXiv

2510.11545

Model Theft

OWASP ML Top 10 — ML05

Model Theft

OWASP LLM Top 10 — LLM10

Key Finding

PART reformulation causes 13.5% performance degradation in a 32B student model on AIME 2024 while retaining over 90% semantic similarity to original traces, outperforming summary-based protections on information preservation.

PART

Novel technique introduced


Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.


Key Contributions

  • PART: a two-step reasoning trace reformulation (self-talk token removal + sub-conclusion reordering) that disrupts distillation while preserving human-readable information
  • Small auxiliary model trained to perform reformulation with minimal computational overhead, avoiding costly teacher model retraining
  • Empirical demonstration that PART degrades student model performance by 13.5% on AIME 2024 (54.17 → 46.88 for a 32B model) while retaining 90.1% semantic information fidelity vs. 7.3% for summary-based methods

🛡️ Threat Analysis

Model Theft

The paper directly defends against unauthorized knowledge distillation — adversaries collect reasoning traces and fine-tune student models to clone proprietary capabilities, which is model theft. PART disrupts this by reformulating traces so they remain human-readable but degrade SFT-based distillation.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
black_boxinference_timetraining_time
Datasets
AIME 2024
Applications
llm reasoning trace protectionanti-distillationproprietary model ip protection