GLiNER Wolof NER
A Named Entity Recognition (NER) model for Wolof language, fine-tuned from urchade/gliner_multi_pii-v1 on the MasakhaNER dataset.
Model Description
This model can identify the following entity types in Wolof text:
- PER - Person names
- ORG - Organizations
- LOC - Locations
- DATE - Dates
Usage
from gliner import GLiNER
# Load the model
model = GLiNER.from_pretrained("Lahad/gliner_wolof_NER")
# Define entity types
labels = ["PER", "ORG", "LOC", "DATE"]
# Predict entities
text = "Ousmane Sonko jΓ ngae na ci Daaray Cheikh Anta Diop ci Dakar."
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(f"{entity['text']} => {entity['label']} (score: {entity['score']:.2f})")
β Ousmane Sonko => PER (score: 0.95)
β Daaray Cheikh Anta Diop => ORG (score: 0.89)
β Dakar => LOC (score: 0.97)
Training Details
- Base Model: urchade/gliner_multi_pii-v1
- Dataset: MasakhaNER (Wolof subset)
- Training samples: 5,143
- Validation samples: 643
- Epochs: 10
- Learning rate: 5e-6
- Batch size: 16
π Dataset
This project uses the MasakhaNER dataset, which provides high-quality NER annotations for 10 African languages including Wolof (wol).
Dataset Split:
- Train: 1,871 samples
- Validation: 267 samples
- Test: 539 samples
Entity Types:
- PER - Person names
- ORG - Organizations
- LOC - Locations
- DATE - Dates
π Evaluation Results
Evaluation on the test set:
- 539 sentences/examples
- 505 total annotated entities across these sentences
| Entity Type | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| DATE | 30.77% | 22.86% | 26.23% | 70 |
| LOC | 76.75% | 84.95% | 80.65% | 206 |
| ORG | 41.89% | 56.36% | 48.06% | 55 |
| PER | 53.02% | 70.69% | 60.59% | 174 |
| GLOBAL | 58.87% | 68.32% | 63.24% | 505 |
β οΈ Performance Note
The model was fine-tuned on a relatively limited dataset (MasakhaNER Wolof). Current performance reflects this constraint, particularly for DATE and ORG entity types which have fewer training examples.
Future Improvements:
- Collect and annotate more data in Wolof
- Increase source diversity (newspapers, social media, literature)
- Experiment with data augmentation techniques
With more annotated data, we expect to significantly improve the model's performance.
License
MIT
- Downloads last month
- 35
Model tree for Lahad/gliner_wolof_NER
Base model
urchade/gliner_multi_pii-v1