SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 384 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Branch: Disneyland_HongKong. Review: It is my second time coming to HK Disneyland. Unlucky us, it rained. Haha! Although rain didnt stop us from enjoying. Its just sad that the afternoon parade was cancelled. But during the night when the weather cleared, HK Disneyland pushed through the Paint the Night Parade. It was my first time seeing it. 6 years ago there was only fireworks behind the castle (which is now still under renovation).My family and I will definitely go back. Love seeing my little one enjoying Disneyland. ',
"Branch: Disneyland_HongKong. Review: Must see for kids, judging by their reaction to everything what is happening.I will now talk about obvious things, but those were disappointing.Half of performances are in Chinese, which Is logical, but does not allow one english speaking to fully integrateVery limited choice of food, mostly fast food, which is always true for open parks as well. Not too much interactive attractions, rather very conservativeStill, for kids this is lots of fun, so I suppose adult's opinion is a bit biased ",
"Branch: Disneyland_California. Review: first tip:pre purchase your tickets, do not arrive there without them, second tip: load your 3 fastpasses 30 days prior to your arrival on the Disney website, create an account and link all of your tickets together. You will then be able to quickly use the Fastpass kiosks once in the park, where you will only be able to get one Fastpass at a time, but you'll be able to do this for all tickets at the same time if you wish a great time saver.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 1.0000],
# [1.0000, 1.0000, 1.0000],
# [1.0000, 1.0000, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 34,124 training samples
- Columns:
sentence_0andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 label type string int details - min: 18 tokens
- mean: 142.21 tokens
- max: 384 tokens
- 0: ~7.00%
- 1: ~12.00%
- 2: ~81.00%
- Samples:
sentence_0 label Branch: Disneyland_California. Review: This is the most magical place on earth. We loved it so much. The children want to visit Disneyland again on the way to Hawaii this year. Every ride or attraction you go on is fantastic, the fireworks are spectacular, the street parade and music just makes you want to dance. The outside world doesn't exist at Disneyland everyone is smiling and having a great time. We have a 15 year old and 5 year old and they were equally pleased with everything and there is something for all ages. My particular favourite was the Haunted Mansion. We had a two day pass and I feel it was just enough for Disneyland and we didn't even go to Disney Adventure Park next door, so I would suggest a four day pass if you want to go there too and get the best of everything. Probably the only negative thing was trying to find something for dinner in the evening as we were near the street parade and there was only a limited number of foods available there, there seems to be d...2Branch: Disneyland_California. Review: try to go to park Tuesday Thursday. Not as crowded. We didn't wait more than 5 mins for any of the rides.2Branch: Disneyland_California. Review: This was my third trip to Disneyland, but I hadn't been in over 10 years and everything was better than I imagined. There is no place better to shed off the responsibility of adulthood and just believe in magic and let the world be amazing. Perfect vacation!2 - Loss:
BatchAllTripletLoss
Training Hyperparameters
Non-Default Hyperparameters
multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.1172 | 500 | 3.5188 |
| 0.2344 | 1000 | 3.4596 |
| 0.3516 | 1500 | 3.378 |
| 0.4688 | 2000 | 3.2981 |
| 0.5860 | 2500 | 3.3428 |
| 0.7032 | 3000 | 3.4489 |
| 0.8204 | 3500 | 3.4179 |
| 0.9376 | 4000 | 3.443 |
| 1.0549 | 4500 | 3.2959 |
| 1.1721 | 5000 | 3.3311 |
| 1.2893 | 5500 | 3.3234 |
| 1.4065 | 6000 | 3.1911 |
| 1.5237 | 6500 | 3.3471 |
| 1.6409 | 7000 | 3.2689 |
| 1.7581 | 7500 | 3.1679 |
| 1.8753 | 8000 | 3.228 |
| 1.9925 | 8500 | 3.0726 |
| 2.1097 | 9000 | 3.1386 |
| 2.2269 | 9500 | 3.0727 |
| 2.3441 | 10000 | 3.1506 |
| 2.4613 | 10500 | 3.0997 |
| 2.5785 | 11000 | 3.1086 |
| 2.6957 | 11500 | 2.98 |
| 2.8129 | 12000 | 3.0279 |
| 2.9301 | 12500 | 3.1965 |
Framework Versions
- Python: 3.12.3
- Sentence Transformers: 5.1.2
- Transformers: 4.53.1
- PyTorch: 2.7.1+cu126
- Accelerate: 1.12.0
- Datasets: 3.6.0
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
BatchAllTripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 3
Model tree for AC7989/all-mpnet-sbert-lightgbm-disneyland
Base model
sentence-transformers/all-mpnet-base-v2