benchmark 2025

AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

1 citations · arXiv

Published on arXiv

2512.20164

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Combined prompt-based and FIDS defense reduces attack success by 26.3%, with training-time LoRA adaptation outperforming inference-time mitigations in both security and utility preservation.

FIDS (Foreign Instruction Detection through Separation)

Novel technique introduced

Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vulnerability: LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task. Notably, while defenses may exist for mature domains such as code review, they are often absent in other common applications such as resume screening and peer review. This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types. We evaluate two defense mechanisms: prompt-based defenses achieve 10.1% attack reduction with 12.5% false rejection increase, while our proposed FIDS (Foreign Instruction Detection through Separation) using LoRA adaptation achieves 15.4% attack reduction with 10.4% false rejection increase. The combined approach provides 26.3% attack reduction, demonstrating that training-time defenses outperform inference-time mitigations in both security and utility preservation.

Key Contributions

First benchmark for evaluating adversarial prompt injection vulnerabilities in LLM-based resume screening, with attack success rates exceeding 80%
FIDS (Foreign Instruction Detection through Separation) defense using LoRA adaptation, achieving 15.4% attack reduction with only 10.4% false rejection increase
Empirical demonstration that training-time defenses (FIDS) outperform inference-time prompt-based defenses, with a combined approach yielding 26.3% attack reduction

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Applications

resume screeningautomated hiringllm document processing pipelines

Read PDF arXiv DOI Code

AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development

Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models

Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models

A methodological analysis of prompt perturbations and their effect on attack success rates