StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching

Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investigations, aids detection of harmful content, offers exploratory cues for certain medical conditions (e.g., early signs of dementia or depression), provides historical context for literary works, and helps uncover misinformation and disinformation. In contrast, when stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes. Techniques such as de-anonymization, re-identification, tracking, profiling, and downstream effects like censorship illustrate the privacy threats that stylometric analysis can enable. Building on these concerns, this paper further explores how adversarial stylometry combined with steganography can counteract stylometric analysis. We first present enhancements to our adversarial attack, $\textit{TraceTarnish}$, providing stronger evidence of its capacity to confound stylometric systems and reduce their attribution and verification accuracy. Next, we examine how steganographic embedding can be fine-tuned to mask an author's stylistic fingerprint, quantifying the level of authorship obfuscation achievable as a function of the proportion of words altered with zero-width Unicode characters. Based on our findings, steganographic coverage of 33% or higher seemingly ensures authorship obfuscation. Finally, we reflect on the ways stylometry can be used to undermine privacy and argue for the necessity of defensive tools like $\textit{TraceTarnish}$.

Key Contributions

Enhancements to the TraceTarnish adversarial stylometry attack, demonstrating stronger capacity to confound authorship attribution and verification systems
Quantification of authorship obfuscation as a function of steganographic coverage using zero-width Unicode character injection
Finding that 33%+ steganographic word coverage reliably ensures authorship obfuscation while preserving readability

🛡️ Threat Analysis

Input Manipulation Attack

TraceTarnish crafts adversarial text inputs (via steganographic zero-width Unicode character injection) that cause stylometric/authorship attribution classifiers to misattribute or fail to verify authorship at inference time — a text-level evasion attack against ML-based text classifiers.

Details

Domains

nlp

Model Types

traditional_mltransformer

Threat Tags

black_boxinference_timetargeteddigital

Applications

2025 1 cit.

Input Manipulation Attack

85%