ML Security Papers

Latest papers

3 papers

defense arXiv Mar 24, 2026 · 13d ago

SAiW: Source-Attributable Invisible Watermarking for Proactive Deepfake Defense

Bibek Das, Chandranath Adak, Soumi Chattopadhyay et al. · Indian Institute of Technology Patna · Indian Institute of Technology Indore +2 more

Embeds source-attributable invisible watermarks in generated images to trace deepfake origins and verify media authenticity

Output Integrity Attack visiongenerative

PDF Code

Deepfakes generated by modern generative models pose a serious threat to information integrity, digital identity, and public trust. Existing detection methods are largely reactive, attempting to identify manipulations after they occur and often failing to generalize across evolving generation techniques. This motivates the need for proactive mechanisms that secure media authenticity at the time of creation. In this work, we introduce SAiW, a Source-Attributed Invisible watermarking Framework for proactive deepfake defense and media provenance verification. Unlike conventional watermarking methods that treat watermark payloads as generic signals, SAiW formulates watermark embedding as a source-conditioned representation learning problem, where watermark identity encodes the originating source and modulates the embedding process to produce discriminative and traceable signatures. The framework integrates feature-wise linear modulation to inject source identity into the embedding network, enabling scalable multi-source watermark generation. A perceptual guidance module derived from human visual system priors ensures that watermark perturbations remain visually imperceptible while maintaining robustness. In addition, a dual-purpose forensic decoder simultaneously reconstructs the embedded watermark and performs source attribution, providing both automated verification and interpretable forensic evidence. Extensive experiments across multiple deepfake datasets demonstrate that SAiW achieves high perceptual quality while maintaining strong robustness against compression, filtering, noise, geometric transformations, and adversarial perturbations. By binding digital media to its origin through invisible yet verifiable markers, SAiW enables reliable authentication and source attribution, providing a scalable foundation for proactive deepfake defense and trustworthy media provenance.

gan diffusion Indian Institute of Technology Patna · Indian Institute of Technology Indore · State University of New York Polytechnic Institute +1 more

PDF arXiv Code

attack arXiv Jan 23, 2026 · 10w ago

Persona Jailbreaking in Large Language Models

Jivnesh Sandhan, Fei Cheng, Tushar Sandhan et al. · Kyoto University · Indian Institute of Technology Kanpur

Black-box attack gradually hijacks LLM personas via adversarial conversational history, bypassing guardrails across 8 LLMs

Prompt Injection nlp

PDF Code

attack arXiv Oct 17, 2025 · Oct 2025

Constrained Adversarial Perturbation

Virendra Nishad, Bhaskar Mukhoty, Hilal AlQuabeh et al. · Indian Institute of Technology Kanpur · Indian Institute of Technology Delhi +2 more

Proposes CAP, constraint-aware universal adversarial perturbations for tabular domains via augmented Lagrangian min-max optimization

Input Manipulation Attack tabular

PDF

Latest papers

SAiW: Source-Attributable Invisible Watermarking for Proactive Deepfake Defense

Persona Jailbreaking in Large Language Models

Constrained Adversarial Perturbation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue