VAETKI ๋ชจ๋ธ ์๊ฐ
VAETKI๋ NC-AI๋ฅผ ์ค์ฌ์ผ๋ก ์ด 13๊ฐ ๊ธฐ๊ด์ด ์ฐธ์ฌํ๋ NC-AI ์ปจ์์์์์ ๊ณต๋ ๊ฐ๋ฐํ ๋๊ท๋ชจ ์ธ์ด ๋ชจ๋ธ์ ๋๋ค. ๋๊ท๋ชจ ํ๋ ฅ ์ฒด๊ณ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๊ตฌ์ถ๋ VAETKI๋ ํจ์จ์ฑ๊ณผ ํ์ฅ์ฑ์ ํต์ฌ ๋ชฉํ๋ก ์ค๊ณ๋์์ผ๋ฉฐ, ์ด๋ฅผ ์ํด Mixture-of-Experts(MoE) ์ํคํ ์ฒ๋ฅผ ์ฑํํ์์ต๋๋ค.
VAETKI๋ ์ฐ๊ตฌ ๋ฐ ์ค์๋น์ค ํ๊ฒฝ ๋ชจ๋๋ฅผ ๊ณ ๋ คํด ์ค๊ณ๋ ๋ชจ๋ธ๋ก์, ํฅํ ๊ณ ๋๋ ์ถ๋ก ์ค์ฌ ํ์คํฌ, ์ ๋ฌธ ์ง์ ๊ธฐ๋ฐ ์์ฉ, ์์ด์ ํธํ ํ์ฉ ์๋๋ฆฌ์ค ๋ฑ ๋ค์ํ ๋ถ์ผ์์ ํ์ฉ ๊ฐ๋ฅ์ฑ์ ํ์ฅํด ๋๊ฐ ์ ์๋๋ก ๊ฐ๋ฐ๋๊ณ ์์ผ๋ฉฐ, ์๋์ ๊ฐ์ ์ฃผ์ ํน์ง์ ๊ฐ์ง๊ณ ์์ต๋๋ค:
- Tool Agent์ ๊ฒฝ์ฐ non-thinking mode๋ก ์๋ํ๋ฉฐ, ๊ทธ ์ธ์ ๋ชจ๋ ์์ ์ thinking mode๋ก ์๋ํฉ๋๋ค.
- ์ง์ ์ฌํญ์ ์ ํํ ๋ฐ๋ฅด๋๋ก ์ค๊ณ๋ ์ธ๊ฐ ์ ํธ ์ ๋ ฌ ๋ฐ ๋ณด๋ค ์์ฐ์ค๋ฌ์ด ๋ํ๋ฅผ ์ ๊ณตํฉ๋๋ค.
- ์์ด, ํ๊ตญ์ด, ์ค๊ตญ์ด ๋ฐ ์ผ๋ณธ์ด๋ก ๊ตฌ์ฑ๋ ์ง์ ์ดํ๊ณผ ๋ฒ์ญ์ ์ง์ํฉ๋๋ค.
1. VAETKI Highlights
VAETKI is a large language model developed by the NC-AI consortium, a collaborative initiative led by NC-AI with participation from a total of 13 organizations. Designed with scalability and efficiency as primary goals, VAETKI adopts a Mixture-of-Experts (MoE) architecture to effectively balance performance and computational cost.
VAETKI is developed with both research and real-world applications in mind. It is intended to serve as a flexible foundation for a wide range of use cases, including advanced reasoning tasks, domain-specific knowledge applications, and agent-oriented systems, with the following key features:
- Non-thinking mode is applied for Tool Agent tasks.
- Strong human preference alignment for instruction following and delivering a more natural conversation.
- Support of English/Korean/Chinese/Japanese languages for instruction following (and translation).
2. Model Overview
VAETKI-112B-A10B has the following features:
- Type: Causal (Auto-regressive) Language Models
- Architecture: Transformers, MoE (Mixture of Experts)
- Developed by: NC-AI consortium (with ETRI, Korea University)
- Training Stage: Pretraining & Post-training
- Number of Parameters: 112.2B in total and 10.1B activated
- Number of Paramaters (Non-Embedding): 111.3B
- Number of Layers: 48
- Number of Attention Heads: 24
- Number of Experts: 128
- Number of Activated Experts: 8
- Context Length: 128k tokens
- Vocabulary Size: 126k
- Languages: Korean, English, Chinese, and Japanese
- License: MIT
- Related URLs: https://github.com/wbl-ncai/VAETKI/
For more details, please refer to our Technical Report - to be updated
3. How to Use
See the Quickstart for more details.
4. Training Details
Training Data
| Dataset | # Tokens |
|---|---|
| FineWeb-2(kor_Hang) | 54.5B |
| FineWeb2-HQ | 338.9B |
| The Stack v2 | 1.571T |
| StackExchange_Mar2023 | 2.6B |
| finemath(finemath-3plus) | 37.4B |
| finemath(infiwebmath-3plus) | 23.7B |
| proof-pile-2 | 28.2B |
| Nemotron-CC-v2 | 3.360T |
| Nemotron-CC-Math-v1 | 214.3B |
| Nemotron-Pretraining-Code-v1 | 191.4B |
| Nemotron-Pretraining-SFT-v1 | 367.2B |
| DCLM-baseline-1.0 | 3.190T |
| WanJuan-Korean | 68.9B |
| finemath(finemath-4plus) | 10.4B |
| MegaMath | 208.0B |
| Stack-Edu | 86.7B |
| AceReason-1.1-SFT | 31.4B |
| OpenScience-OS-Q2 | 18.1B |
| OpenScience-OS-Q3 | 0.7B |
| Nemotron-PrisMath | 6.2B |
| OpenCodeGeneticInstruct-Qwen2.5-32b-instruct | 6.8B |
| OpenCodeGeneticInstruct-mixtral-8x22b-instruct | 9.0B |
| Total | 9.8T |
Training Procedure
- Hardware
- Platform: Naver Cloud MLX Platform
- GPUs: NVIDIA H100 80GB HBM3 ร 1,016
- Interconnect: InfiniBand 400 Gb/s, 6 lanes (4 lanes were used for RDMA-based inter-node communication)
- Software: The model architecture configuration, training loop, checkpointing, and distributed optimization logic were implemented based on Megatron-Core v0.14, with selective modifications to accommodate experimental requirements. The implementation includes internal modifications to the original frameworks for research and optimization purposes, and this model does not claim full compatibility with original upstream implementations.
- Hyperparameters
Hyperparameters Value Learning rate 2e-4 โ 1e-4 โ 8e-5 Batch size 8.1M Tokens โ 33M Tokens โ 46M Tokens Context Length 4096 โ 4096 โ 32768
5. Evaluation Results
We evaluate VAETKI 112B-A10B on various benchmarks and compare it with a series of models, as shown in the following.
Global Common Benchmarks
Language Tasks Benchmark (Metric) # Shot gpt-oss-120b Hunyuan-A13B-Instruct VAETKI-112B-A10B Architecture - MoE MoE MoE # Total Params - 117B 80B 112B # Activated Params - 5.1B 13B 10B Korean General NIA benchmark x-shot - - - General KMMLU-Pro x-shot - - - General CLIcK x-shot - - - General KoBALT x-shot - - - Reasoning HRM8K x-shot - - - English General MMLU-Pro x-shot - - - Reasoning GPQA-Diamond x-shot - - - Reasoning AIME `25 x-shot - - - Reasoning HLE (text only) x-shot - - - Reasoning IFBench x-shot - - - Reasoning IFEval x-shot - - - Code LiveCodeBench v6 - - - - Agentic Tau-Bench Telecom x-shot - - - Long Context AA-LCR x-shot - - - Global Additional Benchmarks
Language Tasks Benchmark (Metric) # Shots Hunyuan-A13B-Instruct VAETKI-112B-A10B Architecture - MoE MoE # Total Params - 80B 112B # Activated Params - 13B 10B English General MMLU-Pro x-shot - - English General BBH x-shot - - English Reasoning MATH-500 x-shot - - English Reasoning IFEval x-shot - - English Agentic BFCL v3 x-shot - -
6. Limitations
- Limitations: This model may produce inaccurate or incomplete outputs, including hallucinated content, particularly for ambiguous prompts or tasks requiring high factual accuracy. It may have limitations in complex multi-step reasoning, precise mathematical computation, and strict correctness in code generation. The model does not have the ability to independently verify information.
- (Potential) Biases: The training data may contain social or cultural biases, which can be reflected in the modelโs outputs. Despite mitigation efforts, biases related to gender, ethnicity, nationality, or religion may still occur.
- Out-of-Scope Use: This model is not designed for use in safety-critical or regulated domains, such as medical, legal, financial, or military applications. It should not be relied upon for decisions where errors could lead to harm.
7. License
This model repository is licensed under the MIT License. The use of VAETKI models is subject to the Model License.
8. Citation
@misc{ncai2025vaetkitechnicalreport,
title={VAETKI Technical Report},
author={NC-AI Consortium},
year={2025},
eprint={xxxx.xxxxx},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/xxxx.xxxxx},
}
9. Contact
If you are interested to leave a message or have any questions, please contact us at wbl.ncai.hf@gmail.com.
- Downloads last month
- 102