Pramod Viswanath

attack arXiv Sep 30, 2025 · Sep 2025

Are Robust LLM Fingerprints Adversarially Robust?

Anshul Nasery, Edoardo Contente, Alkin Kaz et al. · University of Washington · Sentient +1 more

Adaptive attacks bypass ten LLM fingerprinting schemes with near-perfect success by exploiting four systemic vulnerabilities in ownership verification

Model Theft Model Theft nlp

3 citations PDF

defense arXiv Oct 15, 2025 · Oct 2025

Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks

Jianzhu Yao, Hongxu Su, Taobo Liao et al. · Princeton University · HKUST (GZ) +1 more

Verifiable inference protocol for cloud ML that detects model swaps and computation tampering with 0.3% overhead using IEEE-754 bounds and Merkle-anchored dispute games

Output Integrity Attack visionnlpgenerative

2 citations PDF

Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present NAO: a Nondeterministic tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. NAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement NAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. NAO reconciles scalability with verifiability for real-world heterogeneous ML compute.

cnn transformer llm diffusion Princeton University · HKUST (GZ) · University of Illinois Urbana-Champaign

PDF arXiv DOI

attack arXiv Nov 21, 2025 · Nov 2025

MURMUR: Using cross-user chatter to break collaborative language agents in groups

Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar et al. · Princeton University · Sentient

Discovers cross-user poisoning: adversarial messages in shared LLM agent history hijack actions of other users at inference time

Prompt Injection Excessive Agency nlp

PDF

Papers in Database (3)

Are Robust LLM Fingerprints Adversarially Robust?

Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks

MURMUR: Using cross-user chatter to break collaborative language agents in groups