Olivera Kotevska

Papers in Database (3)

attack arXiv Mar 19, 2026 · 9w ago

Automated Membership Inference Attacks: Discovering MIA Signal Computations using LLM Agents

Toan Tran, Olivera Kotevska, Li Xiong · Emory University · Oak Ridge National Laboratory

LLM-agent framework that automatically discovers novel membership inference attack strategies, achieving 0.18 AUC improvement over existing MIAs

Membership Inference Attack Vulnerability Discovery Red-Team Agents
PDF
defense arXiv Apr 1, 2026 · 7w ago

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska et al. · University of Nevada · Oak Ridge National Laboratory

Detects LLM jailbreak attacks using logit distributions over numerical tokens, achieving 22.66% ASR reduction with minimal overhead

Prompt Injection nlp
PDF
defense arXiv Apr 6, 2026 · 6w ago

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

Jiahao Xu, Rui Hu, Olivera Kotevska et al. · University of Nevada · Oak Ridge National Laboratory

Multi-bit watermarking embedding binary messages in LLM text for attribution using cross-permutation green lists

Output Integrity Attack nlp
PDF Code