SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-mpnet-base-v2
Maximum Sequence Length: 384 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- parquet

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'A Multiscale RayShooting Model for Termination Detection of TreeLike Structures in Biomedical Images',
    'Digital reconstruction tracing of treelike structures such as neurons retinal blood vessels and bronchi from volumetric images and 2D images is very important to biomedical research Many existing reconstruction algorithms rely on a set of good seed points The 2D or 3D terminations are good candidates for such seed points In this paper we propose an automatic method to detect terminations for treelike structures based on a multiscale rayshooting model and a termination visual prior The multiscale rayshooting model detects 2D terminations by extracting and analyzing the multiscale intensity distribution features around a termination candidate The range of scale is adaptively determined according to the local neurite diameter estimated by the Rayburst sampling algorithm in combination with the grayweighted distance transform The termination visual prior is based on a key observation–when observing a 3D termination from three orthogonal directions without occlusion we can recognize it in at least two views Using this prior with the multiscale rayshooting model we can detect 3D terminations with high accuracies Experiments on 3D neuron image stacks 2D neuron images 3D bronchus image stacks and 2D retinal blood vessel images exhibit average precision and recall rates of 8750 and 9054 The experimental results confirm that the proposed method outperforms other the stateoftheart termination detection methods',
    'Flow Prediction in SpatioTemporal Networks Based on Multitask Deep Learning.Predicting flows e.g. the traffic of vehicles crowds and bikes consisting of the inout traffic at a node and transitions between different nodes in a spatiotemporal network plays an important role in transportation systems. However this is a very challenging problem affected by multiple complex factors such as spatial correlations between different locations temporal correlations among different time intervals and external factors like events and weather. In addition the flow at a node called node flow and transitions between nodes edge flow mutually influence each other. To address these issues we propose a multitask deeplearning framework that simultaneously predicts the node flow and edge flow throughout a spatiotemporal network. Using fully convolutional networks our approach designs two sophisticated models for predicting node flow and edge flow respectively. Two models are connected by coupling their latent representations of middle layers and trained together. The external factors are also integrated into the framework through a gating fusion mechanism. In the edge flow prediction model we employ an embedding component to deal with the sparse transitions between nodes. We evaluate our method based on the taxicab data in Beijing and New York City.Experimental results show advantages of our method beyond 11 baselines such as ConvLSTM CNN and Markov Random Field.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9280, 0.6185],
#         [0.9280, 1.0000, 0.6247],
#         [0.6185, 0.6247, 1.0000]])

Training Details

Training Dataset

parquet

Dataset: parquet
Size: 9,600 training samples
Columns: sentence1, sentence2, and label

Approximate statistics based on the first 1000 samples:

	sentence1	sentence2	label
type	string	string	float
details	min: 6 tokens mean: 157.68 tokens max: 384 tokens	min: 105 tokens mean: 213.12 tokens max: 384 tokens	min: 0.03 mean: 0.44 max: 0.95

Samples:

sentence1	sentence2	label
`A ManyObjective Particle Swarm Optimization Based On Virtual Pareto Front`	A manyobjective problems MaOP refer to the optimization problem involving more than three objectives Particle swarm optimization PSO is one of the potential heuristic methods suited for solving MaOPs The personal best selection strategy the global best selection strategy and the archive maintenance strategy are the three key components in the design of a ManyObjective Particle Swarm Optimization MaOPSO The personal best and global best selection strategies determine the direction where particles will fly The archive maintenance strategy has an important impact on convergence and diversity of its algorithm In MaOPs the high dimensionality in the objective space decreases the probability of a solution to be dominated by the other solutions in the population Thus it becomes more difficult for PSO to select the good leaders from so many nondominated solutions In this paper a virtual Inverted Generational Distance indicator is proposed to evaluate the comprehensive quality of a solution in ...	`0.787`
`Multivision tracking and collaboration based on spatial particle filter`	Abstract In existing multivision tracking methods a distributed collaborative tracking mode based on homography constraints is often adopted yet there are significant shortcomings to this approach For example visual information complementation is not used to improve the robustness of tracking and collaborative tracking is limited by homography constraints In this study a threedimensional spatial particle filter tracking method was proposed and multivision joint tracking and collaboration were effectively achieved This method was based on the existing particle filter framework A twodimensional plane particle was taken as the projection of a threedimensional spatial particle on the imaging plane and the formula for calculating a spatial particle’s weight was derived based on Bayesian posterior probability recursion In addition an approximation method to determine spatial particle weight was given The resampling of spatial particles was performed by using an epipolar line resampling metho...	`0.874`
Incorporating Social Factors in Accessible Design.Personal technologies are rarely designed to be accessible to disabled people partly due to the perceived challenge of including disability in design. Through design workshops we addressed this challenge by infusing usercentered design activities with Design for Social Accessibilitya perspective emphasizing social aspects of accessibilityto investigate how professional designers can leverage social factors to include accessibility in design. We focused on how professional designers incorporated Design for Social Accessibilityu0027s three tenets 1 to work with users with and without visual impairments 2 to consider social and functional factors 3 to employ toolsa framework and method cardsto raise awareness and prompt reflection on social aspects toward accessible design. We then interviewed designers about their workshop experiences. We found DSA to be an effective set of tools and strategies incorporating socialfunctional and nondisabl...	Research on LC Filter Cascaded with Buck Converter Supplying Constant Power Load Based on IDAPassivityBased Control.Nowadays distribution power systems are used in different applications such as aircraft ships submarines and hybrid electric vehicles. However the interaction between individually designed power subsystems may cause instability. Moreover in these applications the constant power load CPL also poses challenges for system dynamic response and stability. Thus the main objective is to stabilize the cascaded system supplying the CPL. An interconnection and damping assignment IDA passivitybased control PBC scheme for LC filter cascaded with buck converter supplying CPL is proposed. The plant is described by portcontrolled Hamiltonian PCH form. Particularly an adaptive interconnection matrix is developed to achieve internal links in PCH system. A modified IDAPBC and its proof are presented to perfect the implementation for the CPL application. Simulation results are given to illu...	`0.095`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Evaluation Dataset

parquet

Dataset: parquet
Size: 1,200 evaluation samples
Columns: sentence1, sentence2, and label

Approximate statistics based on the first 1000 samples:

	sentence1	sentence2	label
type	string	string	float
details	min: 6 tokens mean: 155.19 tokens max: 384 tokens	min: 109 tokens mean: 219.74 tokens max: 384 tokens	min: 0.03 mean: 0.44 max: 0.96

Samples:

sentence1	sentence2	label
`Collapsing Towers of Interpreters`	Given a tower of interpreters ie a sequence of multiple interpreters interpreting one another as input programs we aim to collapse this tower into a compiler that removes all interpretive overhead and runs in a single pass In the real world a use case might be Python code executed by an x86 runtime on a CPU emulated in a JavaScript VM running on an ARM CPU Collapsing such a tower can not only exponentially improve runtime performance but also enable the use of baselanguage tools for interpreted programs eg for analysis and verification In this paper we lay the foundations in an idealized but realistic setting	`0.814`
`Defeating Strong PUF Modeling Attack via Adverse Selection of ChallengeResponse Pairs`	Most advances in increasing the security of PUFs come from alterations to the topology of the mechanism or the measurement used to generate their output This paper focuses on a different form of improving security Selecting a subset of challenges and responses to conceal the patterns inherent in a PUF well enough to prevent an attacker from successfully replicating the PUF’s responses Our results show that it is possible to select a large set of CRPs that can be exposed to an attacker resulting in a modeling accuracy as low as 74 while without our selection process the accuracy increases to 93	`0.868`
An aggregated coupling measure for the analysis of objectoriented software systems.Abstract Coupling is a fundamental property of software systems which is strongly connected with the quality of software design and has high impact on program understanding. The coupling between software components influences software maintenance and evolution as well. In order to ease the maintenance and evolution processes it is essential to estimate the impact of changes made in the software system coupling indicating such a possible impact. This paper introduces a new aggregated coupling measurement which captures both the structural and the conceptual characteristics of coupling between the software components. The proposed measure combines the textual information contained in the source code with the structural relationships between software components. We conduct several experiments which underline that the proposed aggregated coupling measure reveals new characteristics of coupling and is also ef...	A Novel Methodology for Magnetic Hand Motion Tracking in HumanMachine Interfaces.Hand motion tracking represents one of the most widely used humancomputer interfaces. It plays a decisive role in many application areas such as virtual reality systems diagnostic and treatment of a range of diseases as well as robotic hand training with human hand skills. Oftentimes magnetic field sensors combined with permanent or electric magnets are used for hand motion tracking. Typically simple magnet models are used that require additional devices such as acceleration sensors as well as a mathematical model of the anatomic functions of a human hand. In contrast a sensing methodology is presented in the following which is based only on magnetic field sensing. Thus our methodology allows the use of magnetosensitive eskins for hand motion tracking whereby all of their advantages are preserved such as compact dimensions or the robustness against harsh environmental conditions. Furthermore calculations s...	`0.208`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Training Hyperparameters

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3.0
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss
0.4167	500	2.7854
0.8333	1000	2.7558
1.25	1500	2.6434
1.6667	2000	2.5375
2.0833	2500	2.4504
2.5	3000	2.3097
2.9167	3500	2.2395

Framework Versions

Python: 3.11.4
Sentence Transformers: 5.1.1
Transformers: 4.56.2
PyTorch: 2.8.0+cu128
Accelerate: 1.10.1
Datasets: 4.1.1
Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@article{10531646,
    author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
    year={2024},
    doi={10.1109/TASLP.2024.3402087}
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for vaios-stergio/all-mpnet-base-v2-dblp-aminer-10k-pair-label-cosentloss

Base model

sentence-transformers/all-mpnet-base-v2

Finetuned

(336)

this model