SentenceTransformer based on intfloat/e5-small-v2

This is a sentence-transformers model finetuned from intfloat/e5-small-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-small-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'query: Do Peppermint Teas Affect Male Fertility?',
    'passage: OBJECTIVES: To justify the effects of Mentha piperita labiatae and Mentha spicata labiatae herbal teas on plasma total testosterone, luteinizing hormone, and follicle-stimulating hormone levels and testicular histologic features. We performed this study because of major complaints in our area from men about the adverse effects of these herbs on male reproductive function. METHODS: The experimental study included 48 male Wistar albino rats (body weight 200 to 250 g). The rats were randomized into four groups of 12 rats each. The control group was given commercial drinking water, and the experimental groups were given 20 g/L M. piperita tea, 20 g/L M. spicata tea, or 40 g/L M. spicata tea. RESULTS: The follicle-stimulating hormone and luteinizing hormone levels had increased and total testosterone levels had decreased in the experimental groups compared with the control group; the differences were statistically significant. Also, the Johnsen testicular biopsy scores were significantly different statistically between the experimental groups and the control group. Although the mean seminiferous tubular diameter of the experimental groups was relatively greater than in the control group, the difference was not statistically significant. The only effects of M. piperita on testicular tissue was segmental maturation arrest in the seminiferous tubules; however, the effects of M. spicata extended from maturation arrest to diffuse germ cell aplasia in relation to the dose. CONCLUSIONS: Despite the beneficial effects of M. piperita and M. spicata in digestion, we should also be aware of the toxic effects when the herbs are not used in the recommended fashion or at the recommended dose.',
    'passage: We report a series of cases of thyroid dysfunction in adults associated with ingestion of a brand of soy milk manufactured with kombu (seaweed), and a case of hypothyroidism in a neonate whose mother had been drinking this milk. We also report two cases of neonatal hypothyroidism linked to maternal ingestion of seaweed made into soup. These products were found to contain high levels of iodine. Despite increasing awareness of iodine deficiency, the potential for iodine toxicity, particularly from sources such as seaweed, is less well recognised.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5889, 0.0846],
#         [0.5889, 1.0000, 0.1250],
#         [0.0846, 0.1250, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,267 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 13.5 tokens
    • max: 27 tokens
    • min: 21 tokens
    • mean: 333.29 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    query: Is a Chest CT Scan Worth the Radiation Risk? passage: In the past 3 decades, the total number of CT scans performed has grown exponentially. In 2007, > 70 million CT scans were performed in the United States. CT scan studies of the chest comprise a large portion of the CT scans performed today because the technology has transformed the management of common chest diseases, including pulmonary embolism and coronary artery disease. As the number of studies performed yearly increases, a growing fraction of the population is exposed to low-dose ionizing radiation from CT scan. Data extrapolated from atomic bomb survivors and other populations exposed to low-dose ionizing radiation suggest that CT scan-associated radiation may increase an individual's lifetime risk of developing cancer. This finding, however, is not incontrovertible. Because this topic has recently attracted the attention of both the scientific community and the general public, it has become increasingly important for physicians to understand the cancer risk associated...
    query: Does Lowering Cholesterol Improve Blood Flow? passage: Current National Cholesterol Education Program guidelines consider desirable total and low-density lipoprotein cholesterol levels to be < 200 and < 160 mg/dl, respectively, for healthy individuals without multiple coronary risk factors. To determine the extent to which these levels affect vascular function, we assessed flow-mediated (endothelium-dependent) brachial artery vasoactivity noninvasively before, during, and after cholesterol lowering (simvastatin 10 mg/day) in 7 healthy middle-aged men with cholesterol levels meeting current recommendations. Flow-mediated brachial artery vasoactivity was measured using 7.5 MHz ultrasound and expressed as percent diameter change from baseline to hyperemic conditions (1 minute following 5 minutes of blood pressure cuff arterial occlusion). Flow-mediated vasoactivity rose from 5.0 +/- 3.6% at baseline to 10.5 +/- 5.6%, 13.3 +/- 4.3%, and 15.7 +/- 4.9% (all p < 0.05) as cholesterol fell from 200 +/- 12 to 161 +/- 18, 169 +/- 16, and 153...
    query: Is Gemcitabine Plus Erlotinib a Good Treatment for Pancreatic Cancer? passage: Background This study aims to comprehensively summarize the currently available evidences on the efficacy and safety of gemcitabine plus erlotinib for treating advanced pancreatic cancer. Methodology/Principal Findings PubMed, EMBASE, The Cochrane Library and abstracts of recent major conferences were systematically searched to identify relevant publications. Studies that were conducted in advanced pancreatic cancer patients treated with gemcitabine plus erlotinib (with or without comparison with gemcitabine alone) and reporting objective response rate, disease control rate, progression-free survival, time-to-progression, overall survival, 1-year survival rate and/or adverse events were included. Data on objective response rate, disease control rate, 1-year survival rate and adverse events rate, respectively, were combined mainly by using Meta-Analyst software with a random-effects model. Data on progression-free survival, time-to-progression and overall survival were summariz...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 363 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 363 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 13.75 tokens
    • max: 34 tokens
    • min: 63 tokens
    • mean: 329.09 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    query: How Do Intervertebral Discs Age? passage: STUDY DESIGN: A histologic study on age-related changes of the human lumbar intervertebral disc was conducted. OBJECTIVES: To investigate comprehensively age-related temporospatial histologic changes in human lumbar intervertebral disc, and to develop a practicable and reliable classification system for age-related histologic disc alteration. SUMMARY OF THE BACKGROUND DATA: No comprehensive microscopic analysis of age-related disc changes is available. There is no conceptual morphologic framework for classifying age-related disc changes as a reference basis for more sophisticated molecular biologic analyses of the causative factors of disc aging or premature aging (degeneration). METHODS: A total of 180 complete sagittal lumbar motion segment slices obtained from 44 deceased individuals (fetal to 88 years of age) were analyzed with regard to 11 histologic variables for the intervertebral disc and endplate, respectively. In addition, 30 surgical specimens (3 regions each) were ...
    query: How to Get Kids to Eat Veggies passage: Using a repeated measures design, in a nursery setting, a modelling and rewards intervention targeted preschool children's consumption of 8 fruit and 8 vegetables (presented as 4 different food sets, each comprising 2 fruit and 2 vegetables). During the 16-day Baseline 1, and subsequent baselines, the children received a different food set daily, first at snacktime and again at lunchtime; consumption of these foods was not rewarded. In the 32-day fruit intervention phase, Food Set 2 and Food Set 3 were presented on alternate days; rewards were presented only at snacktime, and only for consumption of the fruit components. Following Baseline 2 and Baseline 3, the intervention targeted snack consumption of the vegetable components of Food Sets 1 and 4. Finally, Baseline 4, and 6-month Follow up were conducted. The interventions produced large and significant increases in target fruit and vegetable consumption with smaller, but significant, increases for the paired, opposite categ...
    query: Is Moderate Drinking Good For Breast Cancer Survivors? passage: Background: Alcohol intake has consistently been associated with increased breast cancer incidence in epidemiological studies. However, the relation between alcohol and survival after breast cancer diagnosis is less clear. Methods: We investigated whether alcohol intake was associated with survival among 3146 women diagnosed with invasive breast cancer in the Swedish Mammography Cohort. Alcohol consumption was estimated using a food frequency questionnaire. Cox proportional hazard models were used to calculate hazard ratios (HRs) and 95% confidence intervals (95% CIs). Results: From 1987 to 2008 there were 385 breast cancer-specific deaths and 860 total deaths. No significant association was observed between alcohol intake and breast cancer-specific survival. Women who consumed 10 g per day (corresponding to approximately 0.75 to 1 drinks) or more of alcohol had an adjusted HR (95% CI) of breast cancer-specific death of 1.36 (0.82–2.26;ptrend=0.47) compared with non-drinkers. ...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 30
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 30
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
1.0 26 2.9167 0.2371
2.0 52 0.9199 0.0179
3.0 78 0.1965 0.0069
4.0 104 0.1304 0.0051
5.0 130 0.1139 0.0049
6.0 156 0.089 0.0045
7.0 182 0.0756 0.0046
8.0 208 0.0698 0.0042
9.0 234 0.07 0.0047
10.0 260 0.0618 0.0045
11.0 286 0.0606 0.0048
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
-
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dgwon/e5-small-v2.nfcorpus.meta-llama-Llama-3.2-3B-Instruct.top10pct

Finetuned
(30)
this model

Collection including dgwon/e5-small-v2.nfcorpus.meta-llama-Llama-3.2-3B-Instruct.top10pct

Papers for dgwon/e5-small-v2.nfcorpus.meta-llama-Llama-3.2-3B-Instruct.top10pct