Latest papers

2 papers
attack arXiv Jan 29, 2026 · 9w ago

The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation

Devanshu Sahoo, Manish Prasad, Vasudev Majhi et al. · BITS Pilani · Trustwise +1 more

Embeds adversarial directives in AST comment nodes to hijack LLM-based code graders, achieving >95% manipulation success across 9 SOTA models

Prompt Injection nlp
PDF
defense arXiv Dec 22, 2025 · Dec 2025

PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline

Akshaj Prashanth Rao, Advait Singh, Saumya Kumaar Saksena et al. · Birla Institute of Technology and Science · Trustwise

Lightweight TF-IDF + Linear SVM multi-stage pipeline defends LLMs against prompt injection and jailbreaks with 10x lower latency than ShieldGemma

Prompt Injection nlp
1 citations PDF