StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching
Published on arXiv
2601.09056
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Steganographic embedding of zero-width Unicode characters at 33% or higher word coverage consistently defeats stylometric authorship attribution and verification systems.
TraceTarnish
Novel technique introduced
Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investigations, aids detection of harmful content, offers exploratory cues for certain medical conditions (e.g., early signs of dementia or depression), provides historical context for literary works, and helps uncover misinformation and disinformation. In contrast, when stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes. Techniques such as de-anonymization, re-identification, tracking, profiling, and downstream effects like censorship illustrate the privacy threats that stylometric analysis can enable. Building on these concerns, this paper further explores how adversarial stylometry combined with steganography can counteract stylometric analysis. We first present enhancements to our adversarial attack, $\textit{TraceTarnish}$, providing stronger evidence of its capacity to confound stylometric systems and reduce their attribution and verification accuracy. Next, we examine how steganographic embedding can be fine-tuned to mask an author's stylistic fingerprint, quantifying the level of authorship obfuscation achievable as a function of the proportion of words altered with zero-width Unicode characters. Based on our findings, steganographic coverage of 33% or higher seemingly ensures authorship obfuscation. Finally, we reflect on the ways stylometry can be used to undermine privacy and argue for the necessity of defensive tools like $\textit{TraceTarnish}$.
Key Contributions
- Enhancements to the TraceTarnish adversarial stylometry attack, demonstrating stronger capacity to confound authorship attribution and verification systems
- Quantification of authorship obfuscation as a function of steganographic coverage using zero-width Unicode character injection
- Finding that 33%+ steganographic word coverage reliably ensures authorship obfuscation while preserving readability
🛡️ Threat Analysis
TraceTarnish crafts adversarial text inputs (via steganographic zero-width Unicode character injection) that cause stylometric/authorship attribution classifiers to misattribute or fail to verify authorship at inference time — a text-level evasion attack against ML-based text classifiers.