Katia Sycara

Papers in Database (2)

attack arXiv Apr 27, 2026 · 24d ago

Jailbreaking Frontier Foundation Models Through Intention Deception

Xinhe Wang, Katia Sycara, Yaqi Xie · Carnegie Mellon University

Multi-turn jailbreaking attack that deceives LLM safety by simulating benign intent across conversations to elicit harmful outputs

Prompt Injection nlpmultimodal
PDF
defense arXiv Mar 16, 2026 · 9w ago

Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory

Ce Zhang, Jinxi He, Junyi He et al. · Carnegie Mellon University

Training-free safety framework using self-reflective memory to help VLMs distinguish safe vs unsafe requests in contextually similar scenarios

Prompt Injection multimodalnlpvision
PDF Code