GLiNER Wolof NER

A Named Entity Recognition (NER) model for Wolof language, fine-tuned from urchade/gliner_multi_pii-v1 on the MasakhaNER dataset.

Model Description

This model can identify the following entity types in Wolof text:

  • PER - Person names
  • ORG - Organizations
  • LOC - Locations
  • DATE - Dates

Usage

from gliner import GLiNER

# Load the model
model = GLiNER.from_pretrained("Lahad/gliner_wolof_NER")

# Define entity types
labels = ["PER", "ORG", "LOC", "DATE"]

# Predict entities
text = "Ousmane Sonko jΓ ngae na ci Daaray Cheikh Anta Diop ci Dakar."
entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(f"{entity['text']} => {entity['label']} (score: {entity['score']:.2f})")

  β†’ Ousmane Sonko => PER (score: 0.95)
  β†’ Daaray Cheikh Anta Diop => ORG (score: 0.89)
  β†’ Dakar => LOC (score: 0.97)

Training Details

  • Base Model: urchade/gliner_multi_pii-v1
  • Dataset: MasakhaNER (Wolof subset)
  • Training samples: 5,143
  • Validation samples: 643
  • Epochs: 10
  • Learning rate: 5e-6
  • Batch size: 16

πŸ“Š Dataset

This project uses the MasakhaNER dataset, which provides high-quality NER annotations for 10 African languages including Wolof (wol).

Dataset Split:

  • Train: 1,871 samples
  • Validation: 267 samples
  • Test: 539 samples

Entity Types:

  • PER - Person names
  • ORG - Organizations
  • LOC - Locations
  • DATE - Dates

πŸ“ˆ Evaluation Results

Evaluation on the test set:

  • 539 sentences/examples
  • 505 total annotated entities across these sentences
Entity Type Precision Recall F1-Score Support
DATE 30.77% 22.86% 26.23% 70
LOC 76.75% 84.95% 80.65% 206
ORG 41.89% 56.36% 48.06% 55
PER 53.02% 70.69% 60.59% 174
GLOBAL 58.87% 68.32% 63.24% 505

⚠️ Performance Note

The model was fine-tuned on a relatively limited dataset (MasakhaNER Wolof). Current performance reflects this constraint, particularly for DATE and ORG entity types which have fewer training examples.

Future Improvements:

  • Collect and annotate more data in Wolof
  • Increase source diversity (newspapers, social media, literature)
  • Experiment with data augmentation techniques

With more annotated data, we expect to significantly improve the model's performance.

License

MIT

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Lahad/gliner_wolof_NER

Finetuned
(3)
this model

Dataset used to train Lahad/gliner_wolof_NER