Latest papers

3 papers
attack arXiv Apr 1, 2026 · 5d ago

Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

Swapnil Parekh · Intuit

Backdoor attack on tokenless reasoning models that hijacks continuous latent trajectories via single embedding perturbations, achieving 99%+ success while evading all token-level defenses

Model Poisoning Data Poisoning Attack nlp
PDF
attack arXiv Feb 28, 2026 · 5w ago

CaptionFool: Universal Image Captioning Model Attacks

Swapnil Parekh · Intuit

Universal adversarial patch attack forces VLM image captioners to produce offensive target captions with 94–96% success using only 1.2% of patches

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
benchmark arXiv Nov 22, 2025 · Nov 2025

ASTRA: Agentic Steerability and Risk Assessment Framework

Itay Hazan, Yael Mathov, Guy Shtar et al. · Intuit

Benchmark framework evaluating 13 LLMs on enforcing security guardrails in agentic tool-use settings against novel attacks

Excessive Agency Prompt Injection nlp
PDF