Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
Yubo Li , Ramayya Krishnan , Rema Padman
Published on arXiv
2510.02712
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
AFT models with model-drift interactions achieve best discrimination and calibration for predicting LLM consistency failure, and a derived turn-level monitor successfully flags high-risk conversations before the first inconsistency occurs across 36,951 turns from 9 LLMs
Time-To-Inconsistency (TTI) survival analysis
Novel technique introduced
Large Language Models (LLMs) have revolutionized conversational AI, yet their robustness in extended multi-turn dialogues remains poorly understood. Existing evaluation frameworks focus on static benchmarks and single-turn assessments, failing to capture the temporal dynamics of conversational degradation that characterize real-world interactions. In this work, we present a large-scale survival analysis of conversational robustness, modeling failure as a time-to-event process over 36,951 turns from 9 state-of-the-art LLMs on the MT-Consistency benchmark. Our framework combines Cox proportional hazards, Accelerated Failure Time (AFT), and Random Survival Forest models with simple semantic drift features. We find that abrupt prompt-to-prompt semantic drift sharply increases the hazard of inconsistency, whereas cumulative drift is counterintuitively \emph{protective}, suggesting adaptation in conversations that survive multiple shifts. AFT models with model-drift interactions achieve the best combination of discrimination and calibration, and proportional hazards checks reveal systematic violations for key drift covariates, explaining the limitations of Cox-style modeling in this setting. Finally, we show that a lightweight AFT model can be turned into a turn-level risk monitor that flags most failing conversations several turns before the first inconsistent answer while keeping false alerts modest. These results establish survival analysis as a powerful paradigm for evaluating multi-turn robustness and for designing practical safeguards for conversational AI systems.
Key Contributions
- Formalizes multi-turn LLM robustness as a time-to-inconsistency survival analysis problem using Cox, AFT, and Random Survival Forest models with time-varying semantic drift covariates
- Discovers that abrupt prompt-to-prompt semantic drift sharply increases failure hazard while cumulative drift is paradoxically protective, suggesting conversational adaptation
- Demonstrates that a lightweight AFT-based turn-level risk monitor can flag most failing conversations several turns before the first inconsistent answer with modest false alarm rates