Punya Syon Pandey

h-index: 2 8 citations 7 papers (total)

Papers in Database (3)

benchmark arXiv Feb 6, 2026 · 8w ago

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering

Saad Hossain, Tom Tseng, Punya Syon Pandey et al. · Critical ML Lab · FAR.AI +6 more

Benchmark framework for evaluating LLM tamper resistance across 9 fine-tuning and weight-space attacks on 21 open-weight models

Transfer Learning Attack Prompt Injection nlp
1 citations PDF Code
benchmark arXiv Oct 6, 2025 · Oct 2025

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Punya Syon Pandey, Hai Son Le, Devansh Bhardwaj et al. · University of Toronto · Vector Institute +4 more

Benchmarks LLM vulnerability to sociopolitical harm requests across 585 prompts, 34 countries, revealing 97–98% attack success rates

Prompt Injection nlp
PDF Code
benchmark arXiv Jan 19, 2026 · 11w ago

Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift

Daniel Vennemeyer, Punya Syon Pandey, Phan Anh Duong et al. · University of Cincinnati · University of Toronto +1 more

Compares six LLM fine-tuning objectives and finds ORPO and KL-regularization best preserve jailbreak resistance and alignment at scale

Transfer Learning Attack Prompt Injection nlp
PDF