Yuan Fang

attack arXiv Apr 20, 2026 · 4w ago

Yuan Fang, Yiming Luo, Aimin Zhou et al. · East China Normal University · Shanghai Innovation Institute

Automated red-teaming framework generating diverse toxic datasets via inverted constitutional AI to test LLM safety mechanisms

Prompt Injection Red-Team Agents Benchmarks & Evaluation nlp

defense arXiv Sep 18, 2025 · Sep 2025

Zhuokang Shen, Kaisen Zhang, Bohan Jia et al. · East China Normal University · Sanming University +1 more

Novel VLM-based synthetic image detector using knowledge injection and self-reflection to exceed expert model accuracy with interpretability

Output Integrity Attack visionmultimodal

Papers in Database (2)