benchmark 2025

AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

Honglin Mu 1,2, Jinghao Liu 3, Kaiyang Wan 2, Rui Xing 2,4, Xiuying Chen 2, Timothy Baldwin 2,4, Wanxiang Che 1

1 citations · arXiv

α

Published on arXiv

2512.20164

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Combined prompt-based and FIDS defense reduces attack success by 26.3%, with training-time LoRA adaptation outperforming inference-time mitigations in both security and utility preservation.

FIDS (Foreign Instruction Detection through Separation)

Novel technique introduced


Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vulnerability: LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task. Notably, while defenses may exist for mature domains such as code review, they are often absent in other common applications such as resume screening and peer review. This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types. We evaluate two defense mechanisms: prompt-based defenses achieve 10.1% attack reduction with 12.5% false rejection increase, while our proposed FIDS (Foreign Instruction Detection through Separation) using LoRA adaptation achieves 15.4% attack reduction with 10.4% false rejection increase. The combined approach provides 26.3% attack reduction, demonstrating that training-time defenses outperform inference-time mitigations in both security and utility preservation.


Key Contributions

  • First benchmark for evaluating adversarial prompt injection vulnerabilities in LLM-based resume screening, with attack success rates exceeding 80%
  • FIDS (Foreign Instruction Detection through Separation) defense using LoRA adaptation, achieving 15.4% attack reduction with only 10.4% false rejection increase
  • Empirical demonstration that training-time defenses (FIDS) outperform inference-time prompt-based defenses, with a combined approach yielding 26.3% attack reduction

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Applications
resume screeningautomated hiringllm document processing pipelines