benchmark 2025

EMNLP: Educator-role Moral and Normative Large Language Models Profiling

0 citations

Published on arXiv

2508.15250

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Models with stronger reasoning capabilities are paradoxically more vulnerable to harmful soft prompt injection, while model temperature and other hyperparameters have limited influence on most risk behaviors.

EMNLP (Educator-role Moral and Normative LLMs Profiling)

Novel technique introduced

Simulating Professions (SP) enables Large Language Models (LLMs) to emulate professional roles. However, comprehensive psychological and ethical evaluation in these contexts remains lacking. This paper introduces EMNLP, an Educator-role Moral and Normative LLMs Profiling framework for personality profiling, moral development stage measurement, and ethical risk under soft prompt injection. EMNLP extends existing scales and constructs 88 teacher-specific moral dilemmas, enabling profession-oriented comparison with human teachers. A targeted soft prompt injection set evaluates compliance and vulnerability in teacher SP. Experiments on 14 LLMs show teacher-role LLMs exhibit more idealized and polarized personalities than human teachers, excel in abstract moral reasoning, but struggle with emotionally complex situations. Models with stronger reasoning are more vulnerable to harmful prompt injection, revealing a paradox between capability and safety. The model temperature and other hyperparameters have limited influence except in some risk behaviors. This paper presents the first benchmark to assess ethical and psychological alignment of teacher-role LLMs for educational AI. Resources are available at https://e-m-n-l-p.github.io/.

Key Contributions

First benchmark framework (EMNLP) for personality, moral reasoning, and ethical risk evaluation of LLMs in teacher role-play contexts
88 teacher-specific moral dilemmas including extreme professional scenarios for profession-oriented comparison with human teachers
Soft prompt injection evaluation across 14 LLMs revealing that stronger reasoning capability correlates with greater vulnerability to harmful prompt injection

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

EMNLP benchmark (88 teacher-specific moral dilemmas)extended personality scales

Applications

educational aiteacher role-playing llmsllm safety evaluation

Read PDF arXiv DOI Code

EMNLP: Educator-role Moral and Normative Large Language Models Profiling

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

LJ-Bench: Ontology-Based Benchmark for U.S. Crime

Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models

Gaming the Answer Matcher: Examining the Impact of Text Manipulation on Automated Judgment

Quantifying CBRN Risk in Frontier Models

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

Vulnerability of LLMs' Belief Systems? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models