This is a sentence-transformers model. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 350 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 350, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'how is walnuts antioxidant',
    "Several studies suggest that regular consumption of nuts, mostly walnuts, may have beneficial effects against oxidative stress mediated diseases such as cardiovascular disease and cancer. Walnuts contain several phenolic compounds which are thought to contribute to their biological properties. The present study reports the total phenolic contents and antioxidant properties of methanolic and petroleum ether extracts obtained from walnut (Juglans regia L.) seed, green husk and leaf. The total phenolic contents were determined by the Folin-Ciocalteu method and the antioxidant activities assessed by the ability to quench the stable free radical 2,2'-diphenyl-1-picrylhydrazyl (DPPH) and to inhibit the 2,2'-azobis(2-amidinopropane) dihydrochloride (AAPH)-induced oxidative hemolysis of human erythrocytes. Methanolic seed extract presented the highest total phenolic content (116 mg GAE/g of extract) and DPPH scavenging activity (EC(50) of 0.143 mg/mL), followed by leaf and green husk. In petroleum ether extracts, antioxidant action was much lower or absent. Under the oxidative action of AAPH, all methanolic extracts significantly protected the erythrocyte membrane from hemolysis in a time- and concentration-dependent manner, although leaf extract inhibitory efficiency was much stronger (IC(50) of 0.060 mg/mL) than that observed for green husks and seeds (IC(50) of 0.127 and 0.121 mg/mL, respectively). Walnut methanolic extracts were also assayed for their antiproliferative effectiveness using human renal cancer cell lines A-498 and 769-P and the colon cancer cell line Caco-2. All extracts showed concentration-dependent growth inhibition toward human kidney and colon cancer cells. Concerning A-498 renal cancer cells, all extracts exhibited similar growth inhibition activity (IC(50) values between 0.226 and 0.291 mg/mL), while for both 769-P renal and Caco-2 colon cancer cells, walnut leaf extract showed a higher antiproliferative efficiency (IC(50) values of 0.352 and 0.229 mg/mL, respectively) than green husk or seed extracts. The results obtained herein strongly indicate that walnut tree constitute an excellent source of effective natural antioxidants and chemopreventive agents. Copyright 2009 Elsevier Ltd. All rights reserved.",
    'High postprandial serum lipid concentrations are associated with increased oxidative stress which, in turn, increases the risk of atherosclerosis. Epidemiological studies correlate lower incidence of cardiovascular disease with adherence to the Mediterranean diet. The aim of this study was to evaluate changes in inflammatory (TXB(2) and LTB(4)) and oxidative stress markers (urinary hydrogen peroxide levels and serum antioxidant capacity), in addition to classic lipid parameters, after a fat-rich meal administered to 12 normolipemic, healthy subjects. Following a Latin square design, subjects were divided into three groups, each one receiving a different kind of oil (extra virgin olive oil; EVOO, olive oil; OO or corn oil; CO, together with 150g of potatoes), with 2-week washout periods between treatments. Blood samples were drawn at baseline and after 1, 2, and 6h after the meal. A significant decrease in inflammatory markers, namely TXB(2) and LTB(4), after 2 and 6h after EVOO (but not OO or CO) consumption and a concomitant increase of serum antioxidant capacity were recorded. These data reinforce the notion that the Mediterranean diet reduces the incidence of coronary heart disease partially due to the protective role of its phenolic components, including those of extra virgin olive oil.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 221,065 training samples
Columns: sentence_0, sentence_1, and sentence_2

Approximate statistics based on the first 1000 samples:

	sentence_0	sentence_1	sentence_2
type	string	string	string
details	min: 5 tokens mean: 10.66 tokens max: 28 tokens	min: 23 tokens mean: 294.9 tokens max: 350 tokens	min: 19 tokens mean: 299.77 tokens max: 350 tokens

Samples:

sentence_0	sentence_1	sentence_2
`does meat promote tma`	Intestinal microbiota metabolism of choline/phosphatidylcholine produces trimethylamine (TMA), which is further metabolized to a proatherogenic species, trimethylamine-N-oxide (TMAO). Herein we demonstrate that intestinal microbiota metabolism of dietary L-carnitine, a trimethylamine abundant in red meat, also produces TMAO and accelerates atherosclerosis. Omnivorous subjects are shown to produce significantly more TMAO than vegans/vegetarians following ingestion of L-carnitine through a microbiota-dependent mechanism. Specific bacterial taxa in human feces are shown to associate with both plasma TMAO and dietary status. Plasma L-carnitine levels in subjects undergoing cardiac evaluation (n = 2,595) predict increased risks for both prevalent cardiovascular disease (CVD) and incident major adverse cardiac events (MI, stroke or death), but only among subjects with concurrently high TMAO levels. Chronic dietary L-carnitine supplementation in mice significantly altered cecal microbial comp...	Background: The evidence for meat intake and renal cell carcinoma (RCC) risk is inconsistent. Mutagens related to meat cooking and processing, and variation by RCC subtype may be important to consider. Objective: In a large US cohort, we prospectively investigated intake of meat and meat-related compounds in relation to risk of RCC, as well as clear cell and papillary RCC histologic subtypes. Design: Study participants (492,186) completed a detailed dietary assessment linked to a database of heme iron, heterocyclic amines (HCA), polycyclic aromatic hydrocarbons (PAHs), nitrate, and nitrite concentrations in cooked and processed meats. Over 9 (mean) y of follow-up, we identified 1814 cases of RCC (498 clear cell and 115 papillary adenocarcinomas). HRs and 95% CIs were estimated within quintiles by using multivariable Cox proportional hazards regression. Results: Red meat intake [62.7 g (quintile 5) compared with 9.8 g (quintile 1) per 1000 kcal (median)] was associated with a tendency t...
`what is protein-bound homocysteine`	We investigated total, free and protein-bound plasma homocysteine, cysteine and cysteinylglycine in 13 subjects aged 24-29 y after a breakfast at 0900 h containing 15-18 g of protein and a dinner at 1500 h containing approximately 50 g of protein. Twelve subjects had normal fasting homocysteine (mean +/- SD, 7.6 +/- 1.1 mumol/L) and methionine concentrations (22.7 +/- 3.5 mumol/L) and were included in the statistical analyses. Breakfast caused a small but significant increase in plasma methionine (22.2 +/- 20.6%) and a brief, nonsignificant increase followed by a significant decline in free homocysteine. However, changes in total and bound homocysteine were small. After dinner, there was a marked increase in plasma methionine by 16.7 +/- 8.9 mumol/L (87.9 +/- 49%), which was associated with a rapid and marked increase in free homocysteine (33.7 +/- 19.6%, 4 h after dinner) and a moderate and slow increase in total (13.5 +/- 7.5%, 8 h) and protein-bound (12.6 +/- 9.4%, 8 h) homocysteine...	The response to arterial wall injury is an inflammatory process, which over time becomes integral to the development of atherosclerosis and subsequent plaque instability. However, the underlying injurious agent, critical to this process, has not received much attention. In this review, a model of plaque rupture is hypothesized with two stages of inflammatory activity. In stage I (cholesterol crystal-induced cell injury and apoptosis), intracellular cholesterol crystals induce foam cell apoptosis, setting up a vicious cycle by signaling more macrophages, resulting in accumulation of extra cellular lipids. This local inflammation eventually leads to the formation of a semi-liquid, lipid-rich necrotic core of a vulnerable plaque. In stage II (cholesterol crystal-induced arterial wall injury), the saturated lipid core is now primed for crystallization, which can manifest as a clinical syndrome with a systemic inflammation response. Cholesterol crystallization is the trigger that causes cor...
`what is a selectivity drug`	Background The word selectivity describes a drug's ability to affect a particular cell population in preference to others. As part of the current state of art in the search for new therapeutic agents, the property of selectivity is a mode of action thought to have a high degree of desirability. Consequently there is a growing activity in this area of research. Selectivity is generally a worthy property in a drug because a drug having high selectivity may have a dramatic effect when there is a single agent that can be targeted against the appropriate molecular-driver involved in the pathogenesis of a disease. An example is chronic myeloid leukemia (CML). CML has a specific chromosomal abnormality, the Philadelphia chromosome, that results in a single gene that produces an abnormal protein Discussion There is a burgeoning understanding of the cellular mechanisms that control the etiology and pathogeneses of diseases. This understanding both enables and motivates the development of drugs ...	The regular occurrence of a peak due to an unidentified substance (X) in the gas chromatographic traces obtained from phenolic extracts of urine from human pregnant and non-pregnant females has been reported. The biphasic excretion of X with maxima in the luteal phase of the ovulatory cycle and relatively high levels in the first trimester of pregnancy were noteworthy and suggested that the substance may have a biological significance. Close similarities between the excretory pattern, the chemical and chromatographic properties of X and of those of the known phenolic steroids suggested initially that this compound was steroidal in nature. The same, or a similar, substance seems to be excreted in the vervet monkey (Cercopithecus aethiops pygerythrus). We now report the excretory pattern of X in more detail, the isolation of the pure compound from pooled pregnancy urine and the chemical structure. The structure determined by mass spectrometry, IR spectroscopy and NMR spectrometry is: tra...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 32
per_device_eval_batch_size: 32
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Training Loss
0.0724	500	0.2092
0.1447	1000	0.1661
0.2171	1500	0.1457
0.2895	2000	0.1411
0.3618	2500	0.1296
0.4342	3000	0.1193
0.5066	3500	0.1183
0.5790	4000	0.113
0.6513	4500	0.1094
0.7237	5000	0.1067
0.7961	5500	0.1086
0.8684	6000	0.1062
0.9408	6500	0.1039
1.0132	7000	0.0991
1.0855	7500	0.079
1.1579	8000	0.0822
1.2303	8500	0.0801
1.3026	9000	0.0796
1.3750	9500	0.078
1.4474	10000	0.0786
1.5198	10500	0.0758
1.5921	11000	0.0826
1.6645	11500	0.0813
1.7369	12000	0.0777
1.8092	12500	0.0796
1.8816	13000	0.0773
1.9540	13500	0.0732
2.0263	14000	0.0698
2.0987	14500	0.0638
2.1711	15000	0.0647
2.2435	15500	0.0656
2.3158	16000	0.0626
2.3882	16500	0.065
2.4606	17000	0.0683
2.5329	17500	0.058
2.6053	18000	0.0601
2.6777	18500	0.0637
2.7500	19000	0.062
2.8224	19500	0.062
2.8948	20000	0.0631
2.9671	20500	0.0654

Framework Versions

Python: 3.10.14
Sentence Transformers: 3.3.0
Transformers: 4.46.2
PyTorch: 2.5.1+cu124
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}