Gemma 3 Technical Report
Paper
•
2503.19786
•
Published
•
53
Trained a 0.96 million parameters Urdu Gemma.
A version of Google's Gemma architecture with the following components as defined in GemmaConfig:
apply_rotary_emb(), and causal masking using pre-computed triangular maskAchieved convergence on Urdu corpus with the following performance metrics:
Final Training Metrics (5000 iterations):
- Training Loss: 2.7668
- Validation Loss: 2.9250
- Validation Perplexity: 18.6348
- Learning Rate: 3e-4 with AdamW optimizer
- Batch Size: 16 with 2 gradient accumulation steps
MIT License