Maheep Chaudhary

benchmark arXiv Sep 10, 2025 · Sep 2025

Maheep Chaudhary, Ian Su, Nikhil Hooda et al. · Independent · University of California +6 more

Discovers power-law scaling of LLM evaluation awareness across 15 models, forecasting deceptive capability concealment in larger models

Prompt Injection nlp

attack arXiv Mar 4, 2026 · 4w ago

Maheep Chaudhary · Independent

Adversarially optimized prompts induce LLM sandbagging on benchmarks with 94pp accuracy drops, far exceeding hand-crafted baselines

Prompt Injection nlp

defense arXiv Feb 16, 2026 · 7w ago

David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit et al. · Algoverse AI Research · University of Aberdeen +1 more

Detects backdoored LoRA adapters via SVD spectral statistics on weight matrices, achieving 97% accuracy without model execution

Model Poisoning AI Supply Chain Attacks nlp

Papers in Database (3)