benchmark 2025

Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks

Yubo Li , Ramayya Krishnan , Rema Padman

1 citations · 37 references · arXiv

α

Published on arXiv

2510.02712

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

AFT models with model-drift interactions achieve best discrimination and calibration for predicting LLM consistency failure, and a derived turn-level monitor successfully flags high-risk conversations before the first inconsistency occurs across 36,951 turns from 9 LLMs

Time-To-Inconsistency (TTI) survival analysis

Novel technique introduced


Large Language Models (LLMs) have revolutionized conversational AI, yet their robustness in extended multi-turn dialogues remains poorly understood. Existing evaluation frameworks focus on static benchmarks and single-turn assessments, failing to capture the temporal dynamics of conversational degradation that characterize real-world interactions. In this work, we present a large-scale survival analysis of conversational robustness, modeling failure as a time-to-event process over 36,951 turns from 9 state-of-the-art LLMs on the MT-Consistency benchmark. Our framework combines Cox proportional hazards, Accelerated Failure Time (AFT), and Random Survival Forest models with simple semantic drift features. We find that abrupt prompt-to-prompt semantic drift sharply increases the hazard of inconsistency, whereas cumulative drift is counterintuitively \emph{protective}, suggesting adaptation in conversations that survive multiple shifts. AFT models with model-drift interactions achieve the best combination of discrimination and calibration, and proportional hazards checks reveal systematic violations for key drift covariates, explaining the limitations of Cox-style modeling in this setting. Finally, we show that a lightweight AFT model can be turned into a turn-level risk monitor that flags most failing conversations several turns before the first inconsistent answer while keeping false alerts modest. These results establish survival analysis as a powerful paradigm for evaluating multi-turn robustness and for designing practical safeguards for conversational AI systems.


Key Contributions

  • Formalizes multi-turn LLM robustness as a time-to-inconsistency survival analysis problem using Cox, AFT, and Random Survival Forest models with time-varying semantic drift covariates
  • Discovers that abrupt prompt-to-prompt semantic drift sharply increases failure hazard while cumulative drift is paradoxically protective, suggesting conversational adaptation
  • Demonstrates that a lightweight AFT-based turn-level risk monitor can flag most failing conversations several turns before the first inconsistent answer with modest false alarm rates

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Datasets
MT-Consistency
Applications
conversational aimulti-turn dialogue systems