benchmark 2025

Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs

Daniel Agyei Asante , Md Mokarram Chowdhury , Yang Li

0 citations · 64 references · arXiv

α

Published on arXiv

2511.22099

Input Manipulation Attack

OWASP ML Top 10 — ML01

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Low-rank LLM compression preserves or improves training data privacy and adversarial robustness but degrades PII protection during conversation and fairness across multiple models and compression algorithms.


Large language models (LLMs) have driven major advances across domains, yet their massive size hinders deployment in resource-constrained settings. Model compression addresses this challenge, with low-rank factorization emerging as a particularly effective method for reducing size, memory, and computation while maintaining accuracy. However, while these compressed models boast of benign performance and system-level advantages, their trustworthiness implications remain poorly understood. In this paper, we present the first comprehensive study of how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethical alignment. We evaluate multiple LLMs of different sizes and variants compressed with diverse low-rank algorithms, revealing key insights: (1) low-rank compression preserves or improves training data privacy but weakens PII protection during conversation; (2) adversarial robustness is generally preserved and often enhanced, even under deep compression; (3) ethical reasoning degrades in zero-shot settings but partially recovers with few-shot prompting; (4) fairness declines under compression. Beyond compression, we investigate how model scale and fine-tuning affect trustworthiness, as both are important in low-rank methods. To guide trustworthy compression strategies, we end our paper with a gradient-based attribution analysis to identify which layers in LLMs contribute most to adversarial robustness.


Key Contributions

  • First comprehensive empirical study of how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethical alignment
  • Key finding that compression preserves/improves training data privacy but weakens PII protection during conversation, while adversarial robustness is generally preserved or enhanced even under deep compression
  • Gradient-based attribution analysis identifying which transformer layers contribute most to adversarial robustness in compressed LLMs

🛡️ Threat Analysis

Input Manipulation Attack

A primary axis of evaluation is adversarial robustness — the paper studies how low-rank compression preserves or degrades LLM resistance to adversarial input manipulation, and performs gradient-based attribution to identify which compressed layers most contribute to robustness.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxinference_time
Applications
large language modelsmodel compressionresource-constrained deployment