Maheep Chaudhary

Papers in Database (3)

benchmark arXiv Sep 10, 2025 · Sep 2025

Evaluation Awareness Scales Predictably in Open-Weights Large Language Models

Maheep Chaudhary, Ian Su, Nikhil Hooda et al. · Independent · University of California +6 more

Discovers power-law scaling of LLM evaluation awareness across 15 models, forecasting deceptive capability concealment in larger models

Prompt Injection nlp
PDF Code
attack arXiv Mar 4, 2026 · 4w ago

In-Context Environments Induce Evaluation-Awareness in Language Models

Maheep Chaudhary · Independent

Adversarially optimized prompts induce LLM sandbagging on benchmarks with 94pp accuracy drops, far exceeding hand-crafted baselines

Prompt Injection nlp
PDF
defense arXiv Feb 16, 2026 · 7w ago

Weight space Detection of Backdoors in LoRA Adapters

David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit et al. · Algoverse AI Research · University of Aberdeen +1 more

Detects backdoored LoRA adapters via SVD spectral statistics on weight matrices, achieving 97% accuracy without model execution

Model Poisoning AI Supply Chain Attacks nlp
PDF