attack arXiv Aug 8, 2025 · Aug 2025
Haorui He, Yupeng Li, Bin Benjamin Zhu et al. · Hong Kong Baptist University · The University of Hong Kong +1 more
Poisons RAG knowledge bases of LLM fact-checkers by mimicking claim decomposition and exploiting justifications to craft targeted malicious evidence
Data Poisoning Attack Prompt Injection nlp
State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents \textsc{Fact2Fiction}, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than SOTA attacks across various poisoning budgets and exposes security weaknesses in existing fact-checking systems, highlighting the need for defensive countermeasures.
llm Hong Kong Baptist University · The University of Hong Kong · Microsoft Corporation
attack arXiv Mar 14, 2026 · 23d ago
Zijian Ling, Pingyi Hu, Xiuyong Gao et al. · Huazhong University of Science and Technology · Tsinghua University +1 more
Inaudible near-ultrasonic acoustic channel attack that delivers jailbreak prompts to speech-driven LLMs through commodity hardware
Input Manipulation Attack Prompt Injection nlpaudiomultimodal
Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic channels. We present Sirens' Whisper (SWhisper), the first practical framework for covert prompt-based attacks against speech-driven LLMs under realistic black-box conditions using commodity hardware. SWhisper enables robust, inaudible delivery of arbitrary target baseband audio-including long and structured prompts-on commodity devices by encoding it into near-ultrasound waveforms that demodulate faithfully after acoustic transmission and microphone nonlinearity. This is achieved through a simple yet effective approach to modeling nonlinear channel characteristics across devices and environments, combined with lightweight channel-inversion pre-compensation. Building on this high-fidelity covert channel, we design a voice-aware jailbreak generation method that ensures intelligibility, brevity, and transferability under speech-driven interfaces. Experiments across both commercial and open-source speech-driven LLMs demonstrate strong black-box effectiveness. On commercial models, SWhisper achieves up to 0.94 non-refusal (NR) and 0.925 specific-convincing (SC). A controlled user study further shows that the injected jailbreak audio is perceptually indistinguishable from background-only playback for human listeners. Although jailbreaks serve as a case study, the underlying covert acoustic channel enables a broader class of high-fidelity prompt-injection and commandexecution attacks.
llm multimodal Huazhong University of Science and Technology · Tsinghua University · Microsoft Corporation