-
unsloth/Nemotron-3-Nano-30B-A3B-GGUF
Text Generation • 32B • Updated • 92.7k • 200 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78 -
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.71k • 1.24k -
nvidia/Alpamayo-R1-10B
11B • Updated • 23.2k • 106
Collections
Discover the best community collections!
Collections including paper arxiv:2504.12285
-
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78 -
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Efficient Process Reward Model Training via Active Learning
Paper • 2504.10559 • Published • 13 -
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 5 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 10 -
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Paper • 2211.10438 • Published • 6 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 58
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.71k • 1.24k -
microsoft/bitnet-b1.58-2B-4T-bf16
Text Generation • 2B • Updated • 2.55k • 34 -
microsoft/bitnet-b1.58-2B-4T-gguf
Text Generation • 2B • Updated • 3.26k • 219 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 66 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17
-
unsloth/Nemotron-3-Nano-30B-A3B-GGUF
Text Generation • 32B • Updated • 92.7k • 200 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78 -
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.71k • 1.24k -
nvidia/Alpamayo-R1-10B
11B • Updated • 23.2k • 106
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 5 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 10 -
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Paper • 2211.10438 • Published • 6 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 58
-
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78 -
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper • 2504.11393 • Published • 18 -
Efficient Process Reward Model Training via Active Learning
Paper • 2504.10559 • Published • 13 -
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Paper • 2504.13161 • Published • 93
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.71k • 1.24k -
microsoft/bitnet-b1.58-2B-4T-bf16
Text Generation • 2B • Updated • 2.55k • 34 -
microsoft/bitnet-b1.58-2B-4T-gguf
Text Generation • 2B • Updated • 3.26k • 219 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 78
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 66 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17