CensorshipDetector

Overview

CensorshipDetector is a Chinese-language text-classification model finetuned to classify a given piece of text as more or less similar to known sanitized content (i.e., those pieces of content which remain after being subjected to state censorship including alterations, deletions, and self-imposed censorship). To fine-tune CensorshipDetector we used two corpora of Simplified Chinese text, one which has been subjected to the CCP's online information controls and one which was not. For the non-censored dataset, we used the November 2023 Wikipedia dump. For the censored dataset we scraped 587,819 articles from Baidu Baike, an online encyclopedia which is the largest mainland Chinese alternative to Wikipedia. These articles were scraped from the Internet Archive's snapshots of the encyclopedia. Once we trained CensorshipDetector, we validated it using 5,039 Chinese-language news articles, 3,007 of which were from Chinese state media and the remaining 2,032 were from the Chinese language version of the New York Times. We sourced the state media articles from the news2016zh corpus and we automatically scraped the New York Times articles.

Evaluation and Validation

While fine-tuning, we held back 20% of the training data to use as an evaluation set on which CensorshipDetector achieved an accuracy of 0.9998.

We then used our curated dataset of 5,039 Chinese-language news articles as a validation set. CensorshipDetector achieved an overall accuracy of 91% on the validation set, classifying 93% of the Chinese statemedia articles as censored and 1,769 of the 2,032 (87%) New York Times articles as uncensored, meaning that there is a slight imbalance towards false positives than false negatives.

Citation

If you publish work using our datasets or CensorshipDetector, please cite our work using the following citation:

@inproceedings{ahmed2025censorshipbias
  title     = {An Analysis of Chinese Censorship Bias in LLMs},
  author    = {Ahmed, Mohamed and Knockel, Jeffrey and Greenstadt, Rachel},
  booktitle = {Proceedings on Privacy Enhancing Technologies (PoPETs)},
  volume    = {2025},
  issue     = {4},
  year      = {2025}
}

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 2

Downloads last month: 13

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for mohamedah/censorship_detector

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3705)

this model

Datasets used to train mohamedah/censorship_detector

Collection including mohamedah/censorship_detector

An Analysis of Chinese Censorship Bias in LLMs

Collection

Text-classification model and datasets introduced in An Analysis of Chinese Censorship Bias in LLMs • 3 items • Updated Jun 3, 2025

mohamedah
/

censorship_detector