This is a sentence-transformers model. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 350 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 350, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'how is walnuts antioxidant',
"Several studies suggest that regular consumption of nuts, mostly walnuts, may have beneficial effects against oxidative stress mediated diseases such as cardiovascular disease and cancer. Walnuts contain several phenolic compounds which are thought to contribute to their biological properties. The present study reports the total phenolic contents and antioxidant properties of methanolic and petroleum ether extracts obtained from walnut (Juglans regia L.) seed, green husk and leaf. The total phenolic contents were determined by the Folin-Ciocalteu method and the antioxidant activities assessed by the ability to quench the stable free radical 2,2'-diphenyl-1-picrylhydrazyl (DPPH) and to inhibit the 2,2'-azobis(2-amidinopropane) dihydrochloride (AAPH)-induced oxidative hemolysis of human erythrocytes. Methanolic seed extract presented the highest total phenolic content (116 mg GAE/g of extract) and DPPH scavenging activity (EC(50) of 0.143 mg/mL), followed by leaf and green husk. In petroleum ether extracts, antioxidant action was much lower or absent. Under the oxidative action of AAPH, all methanolic extracts significantly protected the erythrocyte membrane from hemolysis in a time- and concentration-dependent manner, although leaf extract inhibitory efficiency was much stronger (IC(50) of 0.060 mg/mL) than that observed for green husks and seeds (IC(50) of 0.127 and 0.121 mg/mL, respectively). Walnut methanolic extracts were also assayed for their antiproliferative effectiveness using human renal cancer cell lines A-498 and 769-P and the colon cancer cell line Caco-2. All extracts showed concentration-dependent growth inhibition toward human kidney and colon cancer cells. Concerning A-498 renal cancer cells, all extracts exhibited similar growth inhibition activity (IC(50) values between 0.226 and 0.291 mg/mL), while for both 769-P renal and Caco-2 colon cancer cells, walnut leaf extract showed a higher antiproliferative efficiency (IC(50) values of 0.352 and 0.229 mg/mL, respectively) than green husk or seed extracts. The results obtained herein strongly indicate that walnut tree constitute an excellent source of effective natural antioxidants and chemopreventive agents. Copyright 2009 Elsevier Ltd. All rights reserved.",
'High postprandial serum lipid concentrations are associated with increased oxidative stress which, in turn, increases the risk of atherosclerosis. Epidemiological studies correlate lower incidence of cardiovascular disease with adherence to the Mediterranean diet. The aim of this study was to evaluate changes in inflammatory (TXB(2) and LTB(4)) and oxidative stress markers (urinary hydrogen peroxide levels and serum antioxidant capacity), in addition to classic lipid parameters, after a fat-rich meal administered to 12 normolipemic, healthy subjects. Following a Latin square design, subjects were divided into three groups, each one receiving a different kind of oil (extra virgin olive oil; EVOO, olive oil; OO or corn oil; CO, together with 150g of potatoes), with 2-week washout periods between treatments. Blood samples were drawn at baseline and after 1, 2, and 6h after the meal. A significant decrease in inflammatory markers, namely TXB(2) and LTB(4), after 2 and 6h after EVOO (but not OO or CO) consumption and a concomitant increase of serum antioxidant capacity were recorded. These data reinforce the notion that the Mediterranean diet reduces the incidence of coronary heart disease partially due to the protective role of its phenolic components, including those of extra virgin olive oil.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 221,065 training samples
- Columns:
sentence_0,sentence_1, andsentence_2 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 sentence_2 type string string string details - min: 5 tokens
- mean: 10.66 tokens
- max: 28 tokens
- min: 23 tokens
- mean: 294.9 tokens
- max: 350 tokens
- min: 19 tokens
- mean: 299.77 tokens
- max: 350 tokens
- Samples:
sentence_0 sentence_1 sentence_2 does meat promote tmaIntestinal microbiota metabolism of choline/phosphatidylcholine produces trimethylamine (TMA), which is further metabolized to a proatherogenic species, trimethylamine-N-oxide (TMAO). Herein we demonstrate that intestinal microbiota metabolism of dietary L-carnitine, a trimethylamine abundant in red meat, also produces TMAO and accelerates atherosclerosis. Omnivorous subjects are shown to produce significantly more TMAO than vegans/vegetarians following ingestion of L-carnitine through a microbiota-dependent mechanism. Specific bacterial taxa in human feces are shown to associate with both plasma TMAO and dietary status. Plasma L-carnitine levels in subjects undergoing cardiac evaluation (n = 2,595) predict increased risks for both prevalent cardiovascular disease (CVD) and incident major adverse cardiac events (MI, stroke or death), but only among subjects with concurrently high TMAO levels. Chronic dietary L-carnitine supplementation in mice significantly altered cecal microbial comp...Background: The evidence for meat intake and renal cell carcinoma (RCC) risk is inconsistent. Mutagens related to meat cooking and processing, and variation by RCC subtype may be important to consider. Objective: In a large US cohort, we prospectively investigated intake of meat and meat-related compounds in relation to risk of RCC, as well as clear cell and papillary RCC histologic subtypes. Design: Study participants (492,186) completed a detailed dietary assessment linked to a database of heme iron, heterocyclic amines (HCA), polycyclic aromatic hydrocarbons (PAHs), nitrate, and nitrite concentrations in cooked and processed meats. Over 9 (mean) y of follow-up, we identified 1814 cases of RCC (498 clear cell and 115 papillary adenocarcinomas). HRs and 95% CIs were estimated within quintiles by using multivariable Cox proportional hazards regression. Results: Red meat intake [62.7 g (quintile 5) compared with 9.8 g (quintile 1) per 1000 kcal (median)] was associated with a tendency t...what is protein-bound homocysteineWe investigated total, free and protein-bound plasma homocysteine, cysteine and cysteinylglycine in 13 subjects aged 24-29 y after a breakfast at 0900 h containing 15-18 g of protein and a dinner at 1500 h containing approximately 50 g of protein. Twelve subjects had normal fasting homocysteine (mean +/- SD, 7.6 +/- 1.1 mumol/L) and methionine concentrations (22.7 +/- 3.5 mumol/L) and were included in the statistical analyses. Breakfast caused a small but significant increase in plasma methionine (22.2 +/- 20.6%) and a brief, nonsignificant increase followed by a significant decline in free homocysteine. However, changes in total and bound homocysteine were small. After dinner, there was a marked increase in plasma methionine by 16.7 +/- 8.9 mumol/L (87.9 +/- 49%), which was associated with a rapid and marked increase in free homocysteine (33.7 +/- 19.6%, 4 h after dinner) and a moderate and slow increase in total (13.5 +/- 7.5%, 8 h) and protein-bound (12.6 +/- 9.4%, 8 h) homocysteine...The response to arterial wall injury is an inflammatory process, which over time becomes integral to the development of atherosclerosis and subsequent plaque instability. However, the underlying injurious agent, critical to this process, has not received much attention. In this review, a model of plaque rupture is hypothesized with two stages of inflammatory activity. In stage I (cholesterol crystal-induced cell injury and apoptosis), intracellular cholesterol crystals induce foam cell apoptosis, setting up a vicious cycle by signaling more macrophages, resulting in accumulation of extra cellular lipids. This local inflammation eventually leads to the formation of a semi-liquid, lipid-rich necrotic core of a vulnerable plaque. In stage II (cholesterol crystal-induced arterial wall injury), the saturated lipid core is now primed for crystallization, which can manifest as a clinical syndrome with a systemic inflammation response. Cholesterol crystallization is the trigger that causes cor...what is a selectivity drugBackground The word selectivity describes a drug's ability to affect a particular cell population in preference to others. As part of the current state of art in the search for new therapeutic agents, the property of selectivity is a mode of action thought to have a high degree of desirability. Consequently there is a growing activity in this area of research. Selectivity is generally a worthy property in a drug because a drug having high selectivity may have a dramatic effect when there is a single agent that can be targeted against the appropriate molecular-driver involved in the pathogenesis of a disease. An example is chronic myeloid leukemia (CML). CML has a specific chromosomal abnormality, the Philadelphia chromosome, that results in a single gene that produces an abnormal protein Discussion There is a burgeoning understanding of the cellular mechanisms that control the etiology and pathogeneses of diseases. This understanding both enables and motivates the development of drugs ...The regular occurrence of a peak due to an unidentified substance (X) in the gas chromatographic traces obtained from phenolic extracts of urine from human pregnant and non-pregnant females has been reported. The biphasic excretion of X with maxima in the luteal phase of the ovulatory cycle and relatively high levels in the first trimester of pregnancy were noteworthy and suggested that the substance may have a biological significance. Close similarities between the excretory pattern, the chemical and chromatographic properties of X and of those of the known phenolic steroids suggested initially that this compound was steroidal in nature. The same, or a similar, substance seems to be excreted in the vervet monkey (Cercopithecus aethiops pygerythrus). We now report the excretory pattern of X in more detail, the isolation of the pure compound from pooled pregnancy urine and the chemical structure. The structure determined by mass spectrometry, IR spectroscopy and NMR spectrometry is: tra... - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 32per_device_eval_batch_size: 32multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.0724 | 500 | 0.2092 |
| 0.1447 | 1000 | 0.1661 |
| 0.2171 | 1500 | 0.1457 |
| 0.2895 | 2000 | 0.1411 |
| 0.3618 | 2500 | 0.1296 |
| 0.4342 | 3000 | 0.1193 |
| 0.5066 | 3500 | 0.1183 |
| 0.5790 | 4000 | 0.113 |
| 0.6513 | 4500 | 0.1094 |
| 0.7237 | 5000 | 0.1067 |
| 0.7961 | 5500 | 0.1086 |
| 0.8684 | 6000 | 0.1062 |
| 0.9408 | 6500 | 0.1039 |
| 1.0132 | 7000 | 0.0991 |
| 1.0855 | 7500 | 0.079 |
| 1.1579 | 8000 | 0.0822 |
| 1.2303 | 8500 | 0.0801 |
| 1.3026 | 9000 | 0.0796 |
| 1.3750 | 9500 | 0.078 |
| 1.4474 | 10000 | 0.0786 |
| 1.5198 | 10500 | 0.0758 |
| 1.5921 | 11000 | 0.0826 |
| 1.6645 | 11500 | 0.0813 |
| 1.7369 | 12000 | 0.0777 |
| 1.8092 | 12500 | 0.0796 |
| 1.8816 | 13000 | 0.0773 |
| 1.9540 | 13500 | 0.0732 |
| 2.0263 | 14000 | 0.0698 |
| 2.0987 | 14500 | 0.0638 |
| 2.1711 | 15000 | 0.0647 |
| 2.2435 | 15500 | 0.0656 |
| 2.3158 | 16000 | 0.0626 |
| 2.3882 | 16500 | 0.065 |
| 2.4606 | 17000 | 0.0683 |
| 2.5329 | 17500 | 0.058 |
| 2.6053 | 18000 | 0.0601 |
| 2.6777 | 18500 | 0.0637 |
| 2.7500 | 19000 | 0.062 |
| 2.8224 | 19500 | 0.062 |
| 2.8948 | 20000 | 0.0631 |
| 2.9671 | 20500 | 0.0654 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.3.0
- Transformers: 4.46.2
- PyTorch: 2.5.1+cu124
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.20.3
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 10
Model tree for chungimungi/distilbert
Base model
distilbert/distilbert-base-uncased