ML Security Papers

Latest papers

2 papers

attack arXiv Jan 1, 2026 · Jan 2026

Zongwei Wang, Bincheng Gu, Hongyu Yu et al. · Chongqing University · The University of Queensland +2 more

Belief Poisoning Attack corrupts LLM agent profiles and memory to make agents treat humans as outgroup, bypassing human-oriented safety behaviors

Prompt Injection Excessive Agency nlp

benchmark COLING Jan 5, 2025 · Jan 2025

Yang Wang, Chenghua Lin · The University of Sheffield · Automated Analytics +1 more

Benchmarks textual adversarial defences across NLP tasks and proposes TTSO++, a calibration-based defence variant

Input Manipulation Attack nlp

4 citations PDF Code