1. VAETKI-7B-A1B Highlights

VAETKI-7B-A1B is a (small) language model developed by the NC-AI, designed especially for inference efficienty. VAETKI series adopt a Mixture-of-Experts (MoE) architecture to effectively balance performance and computational cost.

2. Model Overview

VAETKI-7B-A1B has the following features:

Type: Causal (Auto-regressive) Language Models
Architecture: Transformers, MoE (Mixture of Experts)
Developed by: NC-AI
Training Stage: Pretraining & Post-training
Number of Parameters: 7.25B in total and 1.2B activated
Number of Paramaters (Non-Embedding): 6.8B
Number of Layers: 24
Number of Attention Heads: 12
Number of Experts: 64
Number of Activated Experts: 5
Context Length: 16k tokens
Vocabulary Size: 126k
Languages: Korean, English, Chinese, and Japanese
License: MIT
Related URLs: https://github.com/wbl-ncai/VAETKI/

For more details, please refer to our Technical Report - to be updated

3. How to Use

See the Quickstart for more details.

4. Training Details

Training Data

Due to training time and resource constraints, only 1.86 trillion tokens from the available data sources were used for pre-training.

Training Procedure

Hardware
- Platform: Naver Cloud MLX Platform
- GPUs: NVIDIA H100 80GB HBM3 × 256
Software: The model architecture configuration, training loop, checkpointing, and distributed optimization logic were implemented based on Megatron-Core v0.14, with selective modifications to accommodate experimental requirements.
Hyperparameters

Hyperparameters Value

Learning rate 2e-4 → 1e-5

Batch size 8.1M → 32.4M Tokens

Context Length 4096 → 16384

Hyperparameters	Value
Learning rate	2e-4 → 1e-5
Batch size	8.1M → 32.4M Tokens
Context Length	4096 → 16384

5. Evaluation Results

We evaluate VAETKI-7B-A1B on various benchmarks and compare it with a series of models, as shown in the following. All three models were evaluated under the same experimental setup to ensure a fair and consistent comparison.

Language	Tasks	Benchmark (Metric)	# Shot	OLMoE-1B-7B-0125-Instruct	LLaDA-MoE-7B-A1B-Instruct	VAETKI-7B-A1B
		Architecture	-	MoE (AR)	MoE (Diffusion)	MoE (AR)
		# Total Params	-	7B	7B	7B
		# Activated Params	-	1.3B	1.4B	1.2B
		# Pre-trained Tokens	-	4.07T	20T	1.86T
Korean	General	KMMLU-Redux	5-shot	-	-	-
	General	KoBEST	10-shot	-	-	-
English	General	MMLU-Pro	5-shot	-	-	-
	General	BBH	3-shot	-	-	-
	Reasoning	GPQA	0-shot	-	-	-

6. Limitations

Limitations: This model may produce inaccurate or incomplete outputs, including hallucinated content, particularly for ambiguous prompts or tasks requiring high factual accuracy. It may have limitations in complex multi-step reasoning, precise mathematical computation, and strict correctness in code generation. The model does not have the ability to independently verify information.
(Potential) Biases: The training data may contain social or cultural biases, which can be reflected in the model’s outputs. Despite mitigation efforts, biases related to gender, ethnicity, nationality, or religion may still occur.
Out-of-Scope Use: This model is not designed for use in safety-critical or regulated domains, such as medical, legal, financial, or military applications. It should not be relied upon for decisions where errors could lead to harm.

7. License

This model repository is licensed under the MIT License. The use of VAETKI models is subject to the Model License.

8. Citation

@misc{ncai2025vaetkitechnicalreport,
      title={VAETKI Technical Report}, 
      author={NC-AI Consortium},
      year={2025},
      eprint={xxxx.xxxxx},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/xxxx.xxxxx}, 
}

9. Contact

If you are interested to leave a message or have any questions, please contact us at wbl.ncai.hf@gmail.com.

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including nc-ai-consortium/VAETKI-7B-A1B

VAETKI

Collection

3 items • Updated 7 days ago • 2