Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
rain2sun
's Collections
Benchmark
NLP
RL-Datasets
Distilled
Math-Code-Reason
Code-IFT-Datasets
Open-LLM
High-Quality-Datasets
Pretrain-Datasets
IFT-Datasets
High-Quality-Datasets
updated
Dec 2, 2024
高质量数据集,包含高密度的知识
Upvote
-
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
72k
•
1.12k
OpenCoder-LLM/opc-annealing-corpus
Viewer
•
Updated
May 29, 2025
•
15.6M
•
3.16k
•
41
hltcoe/megawika
Updated
Jan 31, 2025
•
44.3k
•
41
allenai/dolmino-mix-1124
Viewer
•
Updated
Oct 29, 2025
•
170M
•
53.6k
•
89
Upvote
-
Share collection
View history
Collection guide
Browse collections