SentenceTransformer based on intfloat/e5-small-v2

This is a sentence-transformers model finetuned from intfloat/e5-small-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: intfloat/e5-small-v2
Maximum Sequence Length: 512 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'query: Do Peppermint Teas Affect Male Fertility?',
    'passage: OBJECTIVES: To justify the effects of Mentha piperita labiatae and Mentha spicata labiatae herbal teas on plasma total testosterone, luteinizing hormone, and follicle-stimulating hormone levels and testicular histologic features. We performed this study because of major complaints in our area from men about the adverse effects of these herbs on male reproductive function. METHODS: The experimental study included 48 male Wistar albino rats (body weight 200 to 250 g). The rats were randomized into four groups of 12 rats each. The control group was given commercial drinking water, and the experimental groups were given 20 g/L M. piperita tea, 20 g/L M. spicata tea, or 40 g/L M. spicata tea. RESULTS: The follicle-stimulating hormone and luteinizing hormone levels had increased and total testosterone levels had decreased in the experimental groups compared with the control group; the differences were statistically significant. Also, the Johnsen testicular biopsy scores were significantly different statistically between the experimental groups and the control group. Although the mean seminiferous tubular diameter of the experimental groups was relatively greater than in the control group, the difference was not statistically significant. The only effects of M. piperita on testicular tissue was segmental maturation arrest in the seminiferous tubules; however, the effects of M. spicata extended from maturation arrest to diffuse germ cell aplasia in relation to the dose. CONCLUSIONS: Despite the beneficial effects of M. piperita and M. spicata in digestion, we should also be aware of the toxic effects when the herbs are not used in the recommended fashion or at the recommended dose.',
    'passage: We report a series of cases of thyroid dysfunction in adults associated with ingestion of a brand of soy milk manufactured with kombu (seaweed), and a case of hypothyroidism in a neonate whose mother had been drinking this milk. We also report two cases of neonatal hypothyroidism linked to maternal ingestion of seaweed made into soup. These products were found to contain high levels of iodine. Despite increasing awareness of iodine deficiency, the potential for iodine toxicity, particularly from sources such as seaweed, is less well recognised.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5889, 0.0846],
#         [0.5889, 1.0000, 0.1250],
#         [0.0846, 0.1250, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 3,267 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 8 tokens
mean: 13.5 tokens
max: 27 tokens

min: 21 tokens
mean: 333.29 tokens
max: 512 tokens

	anchor	positive
type	string	string
details	min: 8 tokens mean: 13.5 tokens max: 27 tokens	min: 21 tokens mean: 333.29 tokens max: 512 tokens

Samples:

anchor	positive
`query: Is a Chest CT Scan Worth the Radiation Risk?`	passage: In the past 3 decades, the total number of CT scans performed has grown exponentially. In 2007, > 70 million CT scans were performed in the United States. CT scan studies of the chest comprise a large portion of the CT scans performed today because the technology has transformed the management of common chest diseases, including pulmonary embolism and coronary artery disease. As the number of studies performed yearly increases, a growing fraction of the population is exposed to low-dose ionizing radiation from CT scan. Data extrapolated from atomic bomb survivors and other populations exposed to low-dose ionizing radiation suggest that CT scan-associated radiation may increase an individual's lifetime risk of developing cancer. This finding, however, is not incontrovertible. Because this topic has recently attracted the attention of both the scientific community and the general public, it has become increasingly important for physicians to understand the cancer risk associated...
`query: Does Lowering Cholesterol Improve Blood Flow?`	passage: Current National Cholesterol Education Program guidelines consider desirable total and low-density lipoprotein cholesterol levels to be < 200 and < 160 mg/dl, respectively, for healthy individuals without multiple coronary risk factors. To determine the extent to which these levels affect vascular function, we assessed flow-mediated (endothelium-dependent) brachial artery vasoactivity noninvasively before, during, and after cholesterol lowering (simvastatin 10 mg/day) in 7 healthy middle-aged men with cholesterol levels meeting current recommendations. Flow-mediated brachial artery vasoactivity was measured using 7.5 MHz ultrasound and expressed as percent diameter change from baseline to hyperemic conditions (1 minute following 5 minutes of blood pressure cuff arterial occlusion). Flow-mediated vasoactivity rose from 5.0 +/- 3.6% at baseline to 10.5 +/- 5.6%, 13.3 +/- 4.3%, and 15.7 +/- 4.9% (all p < 0.05) as cholesterol fell from 200 +/- 12 to 161 +/- 18, 169 +/- 16, and 153...
`query: Is Gemcitabine Plus Erlotinib a Good Treatment for Pancreatic Cancer?`	passage: Background This study aims to comprehensively summarize the currently available evidences on the efficacy and safety of gemcitabine plus erlotinib for treating advanced pancreatic cancer. Methodology/Principal Findings PubMed, EMBASE, The Cochrane Library and abstracts of recent major conferences were systematically searched to identify relevant publications. Studies that were conducted in advanced pancreatic cancer patients treated with gemcitabine plus erlotinib (with or without comparison with gemcitabine alone) and reporting objective response rate, disease control rate, progression-free survival, time-to-progression, overall survival, 1-year survival rate and/or adverse events were included. Data on objective response rate, disease control rate, 1-year survival rate and adverse events rate, respectively, were combined mainly by using Meta-Analyst software with a random-effects model. Data on progression-free survival, time-to-progression and overall survival were summariz...

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 16,
    "gather_across_devices": false
}

Evaluation Dataset

Unnamed Dataset

Size: 363 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 363 samples:
anchor positive
type string string
details
min: 7 tokens
mean: 13.75 tokens
max: 34 tokens

min: 63 tokens
mean: 329.09 tokens
max: 512 tokens

	anchor	positive
type	string	string
details	min: 7 tokens mean: 13.75 tokens max: 34 tokens	min: 63 tokens mean: 329.09 tokens max: 512 tokens

Samples:

anchor	positive
`query: How Do Intervertebral Discs Age?`	passage: STUDY DESIGN: A histologic study on age-related changes of the human lumbar intervertebral disc was conducted. OBJECTIVES: To investigate comprehensively age-related temporospatial histologic changes in human lumbar intervertebral disc, and to develop a practicable and reliable classification system for age-related histologic disc alteration. SUMMARY OF THE BACKGROUND DATA: No comprehensive microscopic analysis of age-related disc changes is available. There is no conceptual morphologic framework for classifying age-related disc changes as a reference basis for more sophisticated molecular biologic analyses of the causative factors of disc aging or premature aging (degeneration). METHODS: A total of 180 complete sagittal lumbar motion segment slices obtained from 44 deceased individuals (fetal to 88 years of age) were analyzed with regard to 11 histologic variables for the intervertebral disc and endplate, respectively. In addition, 30 surgical specimens (3 regions each) were ...
`query: How to Get Kids to Eat Veggies`	passage: Using a repeated measures design, in a nursery setting, a modelling and rewards intervention targeted preschool children's consumption of 8 fruit and 8 vegetables (presented as 4 different food sets, each comprising 2 fruit and 2 vegetables). During the 16-day Baseline 1, and subsequent baselines, the children received a different food set daily, first at snacktime and again at lunchtime; consumption of these foods was not rewarded. In the 32-day fruit intervention phase, Food Set 2 and Food Set 3 were presented on alternate days; rewards were presented only at snacktime, and only for consumption of the fruit components. Following Baseline 2 and Baseline 3, the intervention targeted snack consumption of the vegetable components of Food Sets 1 and 4. Finally, Baseline 4, and 6-month Follow up were conducted. The interventions produced large and significant increases in target fruit and vegetable consumption with smaller, but significant, increases for the paired, opposite categ...
`query: Is Moderate Drinking Good For Breast Cancer Survivors?`	passage: Background: Alcohol intake has consistently been associated with increased breast cancer incidence in epidemiological studies. However, the relation between alcohol and survival after breast cancer diagnosis is less clear. Methods: We investigated whether alcohol intake was associated with survival among 3146 women diagnosed with invasive breast cancer in the Swedish Mammography Cohort. Alcohol consumption was estimated using a food frequency questionnaire. Cox proportional hazard models were used to calculate hazard ratios (HRs) and 95% confidence intervals (95% CIs). Results: From 1987 to 2008 there were 385 breast cancer-specific deaths and 860 total deaths. No significant association was observed between alcohol intake and breast cancer-specific survival. Women who consumed 10 g per day (corresponding to approximately 0.75 to 1 drinks) or more of alcohol had an adjusted HR (95% CI) of breast cancer-specific death of 1.36 (0.82–2.26;ptrend=0.47) compared with non-drinkers. ...

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 16,
    "gather_across_devices": false
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 128
learning_rate: 2e-05
num_train_epochs: 30
warmup_ratio: 0.1
fp16: True
load_best_model_at_end: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 30
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss
1.0	26	2.9167	0.2371
2.0	52	0.9199	0.0179
3.0	78	0.1965	0.0069
4.0	104	0.1304	0.0051
5.0	130	0.1139	0.0049
6.0	156	0.089	0.0045
7.0	182	0.0756	0.0046
8.0	208	0.0698	0.0042
9.0	234	0.07	0.0047
10.0	260	0.0618	0.0045
11.0	286	0.0606	0.0048

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.1.2
Transformers: 4.57.1
PyTorch: 2.8.0+cu126
Accelerate: 1.11.0
Datasets: 4.0.0
Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}