benchmark 2025

Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks

Yubo Li , Ramayya Krishnan , Rema Padman

Carnegie Mellon University

1 citations · 37 references · arXiv

Published on arXiv

2510.02712

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

AFT models with model-drift interactions achieve best discrimination and calibration for predicting LLM consistency failure, and a derived turn-level monitor successfully flags high-risk conversations before the first inconsistency occurs across 36,951 turns from 9 LLMs

Time-To-Inconsistency (TTI) survival analysis

Novel technique introduced

Large Language Models (LLMs) have revolutionized conversational AI, yet their robustness in extended multi-turn dialogues remains poorly understood. Existing evaluation frameworks focus on static benchmarks and single-turn assessments, failing to capture the temporal dynamics of conversational degradation that characterize real-world interactions. In this work, we present a large-scale survival analysis of conversational robustness, modeling failure as a time-to-event process over 36,951 turns from 9 state-of-the-art LLMs on the MT-Consistency benchmark. Our framework combines Cox proportional hazards, Accelerated Failure Time (AFT), and Random Survival Forest models with simple semantic drift features. We find that abrupt prompt-to-prompt semantic drift sharply increases the hazard of inconsistency, whereas cumulative drift is counterintuitively \emph{protective}, suggesting adaptation in conversations that survive multiple shifts. AFT models with model-drift interactions achieve the best combination of discrimination and calibration, and proportional hazards checks reveal systematic violations for key drift covariates, explaining the limitations of Cox-style modeling in this setting. Finally, we show that a lightweight AFT model can be turned into a turn-level risk monitor that flags most failing conversations several turns before the first inconsistent answer while keeping false alerts modest. These results establish survival analysis as a powerful paradigm for evaluating multi-turn robustness and for designing practical safeguards for conversational AI systems.

Key Contributions

Formalizes multi-turn LLM robustness as a time-to-inconsistency survival analysis problem using Cox, AFT, and Random Survival Forest models with time-varying semantic drift covariates
Discovers that abrupt prompt-to-prompt semantic drift sharply increases failure hazard while cumulative drift is paradoxically protective, suggesting conversational adaptation
Demonstrates that a lightweight AFT-based turn-level risk monitor can flag most failing conversations several turns before the first inconsistent answer with modest false alarm rates

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Datasets

MT-Consistency

Applications

conversational aimulti-turn dialogue systems

Read PDF arXiv DOI

Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models

The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence

Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models

Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models

A methodological analysis of prompt perturbations and their effect on attack success rates