CensorshipDetector
Overview
CensorshipDetector is a Chinese-language text-classification model finetuned to classify a given piece of text as more or less similar to known sanitized content (i.e., those pieces of content which remain after being subjected to state censorship including alterations, deletions, and self-imposed censorship). To fine-tune CensorshipDetector we used two corpora of Simplified Chinese text, one which has been subjected to the CCP's online information controls and one which was not. For the non-censored dataset, we used the November 2023 Wikipedia dump. For the censored dataset we scraped 587,819 articles from Baidu Baike, an online encyclopedia which is the largest mainland Chinese alternative to Wikipedia. These articles were scraped from the Internet Archive's snapshots of the encyclopedia. Once we trained CensorshipDetector, we validated it using 5,039 Chinese-language news articles, 3,007 of which were from Chinese state media and the remaining 2,032 were from the Chinese language version of the New York Times. We sourced the state media articles from the news2016zh corpus and we automatically scraped the New York Times articles.
Evaluation and Validation
While fine-tuning, we held back 20% of the training data to use as an evaluation set on which CensorshipDetector achieved an accuracy of 0.9998.
We then used our curated dataset of 5,039 Chinese-language news articles as a validation set. CensorshipDetector achieved an overall accuracy of 91% on the validation set, classifying 93% of the Chinese statemedia articles as censored and 1,769 of the 2,032 (87%) New York Times articles as uncensored, meaning that there is a slight imbalance towards false positives than false negatives.
Citation
If you publish work using our datasets or CensorshipDetector, please cite our work using the following citation:
@inproceedings{ahmed2025censorshipbias
title = {An Analysis of Chinese Censorship Bias in LLMs},
author = {Ahmed, Mohamed and Knockel, Jeffrey and Greenstadt, Rachel},
booktitle = {Proceedings on Privacy Enhancing Technologies (PoPETs)},
volume = {2025},
issue = {4},
year = {2025}
}
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 2
- Downloads last month
- 13
Model tree for mohamedah/censorship_detector
Base model
FacebookAI/xlm-roberta-base