Yuan Fang

Papers in Database (2)

attack arXiv Apr 20, 2026 · 4w ago

Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF

Yuan Fang, Yiming Luo, Aimin Zhou et al. · East China Normal University · Shanghai Innovation Institute

Automated red-teaming framework generating diverse toxic datasets via inverted constitutional AI to test LLM safety mechanisms

Prompt Injection Red-Team Agents Benchmarks & Evaluation nlp
PDF Code
defense arXiv Sep 18, 2025 · Sep 2025

DF-LLaVA: Unlocking MLLMs for Synthetic Image Detection via Knowledge Injection and Conflict-Driven Self-Reflection

Zhuokang Shen, Kaisen Zhang, Bohan Jia et al. · East China Normal University · Sanming University +1 more

Novel VLM-based synthetic image detector using knowledge injection and self-reflection to exceed expert model accuracy with interpretability

Output Integrity Attack visionmultimodal
PDF Code