ModernBERT-Embed-Unsupervised

modernbert-embed-unsupervised is the unsupervised checkpoint trained with the contrastors library for 1 epoch over the 235M weakly-supervised contrastive pairs curated in Nomic Embed.

We suggest using moderbert-embed for embedding tasks.

Performance

The modernbert-unsupervised model performs similarly to the nomic-embed-text-v1_unsup model

Model	Average (56)	Classification (12)	Clustering (11)	Pair Classification (3)	Reranking (4)	Retrieval (15)	STS (10)	Overall
nomic-embed-text-v1_unsup	59.9	71.2	42.5	83.7	55.0	48.0	80.8	30.7
modernbert-embed-unsupervised	60.03	72.11	44.34	82.78	55.0	47.05	80.33	31.2

Downloads last month: 410

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for nomic-ai/modernbert-embed-base-unsupervised

Base model

answerdotai/ModernBERT-base

Finetuned

(1012)

this model

Paper for nomic-ai/modernbert-embed-base-unsupervised

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 15

Evaluation results

accuracy on MTEB AmazonCounterfactualClassification (en)
test set self-reported

76.209
ap on MTEB AmazonCounterfactualClassification (en)
test set self-reported

39.251
f1 on MTEB AmazonCounterfactualClassification (en)
test set self-reported

70.152
accuracy on MTEB AmazonPolarityClassification
test set self-reported

91.661
ap on MTEB AmazonPolarityClassification
test set self-reported

88.673
f1 on MTEB AmazonPolarityClassification
test set self-reported

91.653
accuracy on MTEB AmazonReviewsClassification (en)
test set self-reported

46.768
f1 on MTEB AmazonReviewsClassification (en)
test set self-reported

46.153
map_at_1 on MTEB ArguAna
test set self-reported

24.964
map_at_10 on MTEB ArguAna
test set self-reported

39.891