Rohin Shah

h-index: 8 719 citations 11 papers (total)

Papers in Database (1)

defense arXiv Jan 16, 2026 · 11w ago

Building Production-Ready Probes For Gemini

János Kramár, Joshua Engels, Zheng Wang et al. · Google DeepMind

Deploys activation probe classifiers in Gemini to intercept cyber-offensive misuse, solving long-context generalization and adaptive adversarial evasion

Prompt Injection nlp
3 citations PDF