SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 384 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- parquet
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'A Multiscale RayShooting Model for Termination Detection of TreeLike Structures in Biomedical Images',
'Digital reconstruction tracing of treelike structures such as neurons retinal blood vessels and bronchi from volumetric images and 2D images is very important to biomedical research Many existing reconstruction algorithms rely on a set of good seed points The 2D or 3D terminations are good candidates for such seed points In this paper we propose an automatic method to detect terminations for treelike structures based on a multiscale rayshooting model and a termination visual prior The multiscale rayshooting model detects 2D terminations by extracting and analyzing the multiscale intensity distribution features around a termination candidate The range of scale is adaptively determined according to the local neurite diameter estimated by the Rayburst sampling algorithm in combination with the grayweighted distance transform The termination visual prior is based on a key observation–when observing a 3D termination from three orthogonal directions without occlusion we can recognize it in at least two views Using this prior with the multiscale rayshooting model we can detect 3D terminations with high accuracies Experiments on 3D neuron image stacks 2D neuron images 3D bronchus image stacks and 2D retinal blood vessel images exhibit average precision and recall rates of 8750 and 9054 The experimental results confirm that the proposed method outperforms other the stateoftheart termination detection methods',
'Flow Prediction in SpatioTemporal Networks Based on Multitask Deep Learning.Predicting flows e.g. the traffic of vehicles crowds and bikes consisting of the inout traffic at a node and transitions between different nodes in a spatiotemporal network plays an important role in transportation systems. However this is a very challenging problem affected by multiple complex factors such as spatial correlations between different locations temporal correlations among different time intervals and external factors like events and weather. In addition the flow at a node called node flow and transitions between nodes edge flow mutually influence each other. To address these issues we propose a multitask deeplearning framework that simultaneously predicts the node flow and edge flow throughout a spatiotemporal network. Using fully convolutional networks our approach designs two sophisticated models for predicting node flow and edge flow respectively. Two models are connected by coupling their latent representations of middle layers and trained together. The external factors are also integrated into the framework through a gating fusion mechanism. In the edge flow prediction model we employ an embedding component to deal with the sparse transitions between nodes. We evaluate our method based on the taxicab data in Beijing and New York City.Experimental results show advantages of our method beyond 11 baselines such as ConvLSTM CNN and Markov Random Field.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9280, 0.6185],
# [0.9280, 1.0000, 0.6247],
# [0.6185, 0.6247, 1.0000]])
Training Details
Training Dataset
parquet
- Dataset: parquet
- Size: 9,600 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string float details - min: 6 tokens
- mean: 157.68 tokens
- max: 384 tokens
- min: 105 tokens
- mean: 213.12 tokens
- max: 384 tokens
- min: 0.03
- mean: 0.44
- max: 0.95
- Samples:
sentence1 sentence2 label A ManyObjective Particle Swarm Optimization Based On Virtual Pareto FrontA manyobjective problems MaOP refer to the optimization problem involving more than three objectives Particle swarm optimization PSO is one of the potential heuristic methods suited for solving MaOPs The personal best selection strategy the global best selection strategy and the archive maintenance strategy are the three key components in the design of a ManyObjective Particle Swarm Optimization MaOPSO The personal best and global best selection strategies determine the direction where particles will fly The archive maintenance strategy has an important impact on convergence and diversity of its algorithm In MaOPs the high dimensionality in the objective space decreases the probability of a solution to be dominated by the other solutions in the population Thus it becomes more difficult for PSO to select the good leaders from so many nondominated solutions In this paper a virtual Inverted Generational Distance indicator is proposed to evaluate the comprehensive quality of a solution in ...0.787Multivision tracking and collaboration based on spatial particle filterAbstract In existing multivision tracking methods a distributed collaborative tracking mode based on homography constraints is often adopted yet there are significant shortcomings to this approach For example visual information complementation is not used to improve the robustness of tracking and collaborative tracking is limited by homography constraints In this study a threedimensional spatial particle filter tracking method was proposed and multivision joint tracking and collaboration were effectively achieved This method was based on the existing particle filter framework A twodimensional plane particle was taken as the projection of a threedimensional spatial particle on the imaging plane and the formula for calculating a spatial particle’s weight was derived based on Bayesian posterior probability recursion In addition an approximation method to determine spatial particle weight was given The resampling of spatial particles was performed by using an epipolar line resampling metho...0.874Incorporating Social Factors in Accessible Design.Personal technologies are rarely designed to be accessible to disabled people partly due to the perceived challenge of including disability in design. Through design workshops we addressed this challenge by infusing usercentered design activities with Design for Social Accessibilitya perspective emphasizing social aspects of accessibilityto investigate how professional designers can leverage social factors to include accessibility in design. We focused on how professional designers incorporated Design for Social Accessibilityu0027s three tenets 1 to work with users with and without visual impairments 2 to consider social and functional factors 3 to employ toolsa framework and method cardsto raise awareness and prompt reflection on social aspects toward accessible design. We then interviewed designers about their workshop experiences. We found DSA to be an effective set of tools and strategies incorporating socialfunctional and nondisabl...Research on LC Filter Cascaded with Buck Converter Supplying Constant Power Load Based on IDAPassivityBased Control.Nowadays distribution power systems are used in different applications such as aircraft ships submarines and hybrid electric vehicles. However the interaction between individually designed power subsystems may cause instability. Moreover in these applications the constant power load CPL also poses challenges for system dynamic response and stability. Thus the main objective is to stabilize the cascaded system supplying the CPL. An interconnection and damping assignment IDA passivitybased control PBC scheme for LC filter cascaded with buck converter supplying CPL is proposed. The plant is described by portcontrolled Hamiltonian PCH form. Particularly an adaptive interconnection matrix is developed to achieve internal links in PCH system. A modified IDAPBC and its proof are presented to perfect the implementation for the CPL application. Simulation results are given to illu...0.095 - Loss:
CoSENTLosswith these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Evaluation Dataset
parquet
- Dataset: parquet
- Size: 1,200 evaluation samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string float details - min: 6 tokens
- mean: 155.19 tokens
- max: 384 tokens
- min: 109 tokens
- mean: 219.74 tokens
- max: 384 tokens
- min: 0.03
- mean: 0.44
- max: 0.96
- Samples:
sentence1 sentence2 label Collapsing Towers of InterpretersGiven a tower of interpreters ie a sequence of multiple interpreters interpreting one another as input programs we aim to collapse this tower into a compiler that removes all interpretive overhead and runs in a single pass In the real world a use case might be Python code executed by an x86 runtime on a CPU emulated in a JavaScript VM running on an ARM CPU Collapsing such a tower can not only exponentially improve runtime performance but also enable the use of baselanguage tools for interpreted programs eg for analysis and verification In this paper we lay the foundations in an idealized but realistic setting0.814Defeating Strong PUF Modeling Attack via Adverse Selection of ChallengeResponse PairsMost advances in increasing the security of PUFs come from alterations to the topology of the mechanism or the measurement used to generate their output This paper focuses on a different form of improving security Selecting a subset of challenges and responses to conceal the patterns inherent in a PUF well enough to prevent an attacker from successfully replicating the PUF’s responses Our results show that it is possible to select a large set of CRPs that can be exposed to an attacker resulting in a modeling accuracy as low as 74 while without our selection process the accuracy increases to 930.868An aggregated coupling measure for the analysis of objectoriented software systems.Abstract Coupling is a fundamental property of software systems which is strongly connected with the quality of software design and has high impact on program understanding. The coupling between software components influences software maintenance and evolution as well. In order to ease the maintenance and evolution processes it is essential to estimate the impact of changes made in the software system coupling indicating such a possible impact. This paper introduces a new aggregated coupling measurement which captures both the structural and the conceptual characteristics of coupling between the software components. The proposed measure combines the textual information contained in the source code with the structural relationships between software components. We conduct several experiments which underline that the proposed aggregated coupling measure reveals new characteristics of coupling and is also ef...A Novel Methodology for Magnetic Hand Motion Tracking in HumanMachine Interfaces.Hand motion tracking represents one of the most widely used humancomputer interfaces. It plays a decisive role in many application areas such as virtual reality systems diagnostic and treatment of a range of diseases as well as robotic hand training with human hand skills. Oftentimes magnetic field sensors combined with permanent or electric magnets are used for hand motion tracking. Typically simple magnet models are used that require additional devices such as acceleration sensors as well as a mathematical model of the anatomic functions of a human hand. In contrast a sensing methodology is presented in the following which is based only on magnetic field sensing. Thus our methodology allows the use of magnetosensitive eskins for hand motion tracking whereby all of their advantages are preserved such as compact dimensions or the robustness against harsh environmental conditions. Furthermore calculations s...0.208 - Loss:
CoSENTLosswith these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Training Hyperparameters
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3.0max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.4167 | 500 | 2.7854 |
| 0.8333 | 1000 | 2.7558 |
| 1.25 | 1500 | 2.6434 |
| 1.6667 | 2000 | 2.5375 |
| 2.0833 | 2500 | 2.4504 |
| 2.5 | 3000 | 2.3097 |
| 2.9167 | 3500 | 2.2395 |
Framework Versions
- Python: 3.11.4
- Sentence Transformers: 5.1.1
- Transformers: 4.56.2
- PyTorch: 2.8.0+cu128
- Accelerate: 1.10.1
- Datasets: 4.1.1
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CoSENTLoss
@article{10531646,
author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
year={2024},
doi={10.1109/TASLP.2024.3402087}
}
- Downloads last month
- 3
Model tree for vaios-stergio/all-mpnet-base-v2-dblp-aminer-10k-pair-label-cosentloss
Base model
sentence-transformers/all-mpnet-base-v2