Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
9
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("erichuber/finetuned-medical-model")
# Run inference
sentences = [
'Does the early adopter of drugs exist?',
"Within drug groups, indicators of drug adoption, except for adoption time, correlate reasonably well. However, the theory that physicians' early adoption of new drugs is a personal trait independent of the type of drug could not be confirmed. The notion of the early-drug-adopting general practitioner may be mistaken.",
'How is pars planitis diagnosed? Pars planitis is typically diagnosed based on a specialized eye examination. During the exam, the ophthalmologist will typically see clusters of white blood cells trapped within the eyeball that are called snowballs (or "inflammatory exudate"). If these clusters are located on the pars plana, they are known as snowbanks. Snowbanks are considered a "hallmark" sign of pars planitis. It is often recommended that people over age 25 with pars planitis have an MRI of their brain and spine to rule out multiple sclerosis.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
What is (are) Psoriasis ? |
How might Fanconi Bickel syndrome be treated? Management of Fanconi Bickel syndrome (FBS) generally focuses on the signs and symptoms of the condition. Treatment includes replacement of water and electrolytes, and vitamin D and phosphate supplements for prevention of hypophosphatemic rickets. Although there is limited data on the effectiveness of dietary treatment for this condition, it is recommended that affected individuals follow a galactose-restricted diabetic diet, with fructose as the main source of carbohydrate. Diet and supplements may alleviate some of the signs and symptoms of the condition but generally do not improve growth, resulting in short stature in adulthood. |
0.0 |
Is Adenoid cystic carcinoma inherited ? |
What are the signs and symptoms of Kaufman oculocerebrofacial syndrome? The Human Phenotype Ontology provides the following list of signs and symptoms for Kaufman oculocerebrofacial syndrome. If the information is available, the table below includes how often the symptom is seen in people with this condition. You can use the MedlinePlus Medical Dictionary to look up the definitions for these medical terms. Signs and Symptoms Approximate number of patients (when available) Abnormality of calvarial morphology 90% Arachnodactyly 90% Cognitive impairment 90% Long toe 90% Microcephaly 90% Optic atrophy 90% Respiratory insufficiency 90% Upslanted palpebral fissure 90% Abnormality of the palate 50% Aplasia/Hypoplasia of the eyebrow 50% Blepharophimosis 50% Epicanthus 50% Long face 50% Microcornea 50% Microdontia 50% Muscle weakness 50% Myopia 50% Narrow face 50% Nystagmus 50% Preauricular skin tag 50% Short philtrum 50% Strabismus 50% Telecanthus 50% Thin vermilion border 50% Wide mouth 50% C... |
0.0 |
Is protein C deficiency inherited ? |
The heart has an internal electrical system that controls the rhythm of the heartbeat. Problems can cause abnormal heart rhythms, called arrhythmias. There are many types of arrhythmia. During an arrhythmia, the heart can beat too fast, too slow, or it can stop beating. Sudden cardiac arrest (SCA) occurs when the heart develops an arrhythmia that causes it to stop beating. This is different than a heart attack, where the heart usually continues to beat but blood flow to the heart is blocked. There are many possible causes of SCA. They include coronary heart disease, physical stress, and some inherited disorders. Sometimes there is no known cause for the SCA. Without medical attention, the person will die within a few minutes. People are less likely to die if they have early defibrillation. Defibrillation sends an electric shock to restore the heart rhythm to normal. You should give cardiopulmonary resuscitation (CPR) to a person having SCA until defibrillation can be done. If... |
0.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
per_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 1multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
sentence-transformers/all-MiniLM-L6-v2