orionweller commited on
Commit
eb43613
Β·
verified Β·
1 Parent(s): 5231cf2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -4,12 +4,29 @@ language:
4
  - en
5
  pipeline_tag: fill-mask
6
  ---
7
- # Ettin: an Open Suite of Paired Encoders and Decoders
 
 
 
 
 
8
 
9
  πŸ“„ [Paper](https://arxiv.org/abs/XXXX.XXXXX) | πŸš€ [GitHub Repository](https://github.com/jhu-clsp/ettin-encoder-vs-decoder)
10
 
11
  This model is part of the Ettin suite - the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.
12
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ## Model Description
14
 
15
  Ettin models are designed to provide a foundation for comparing encoder-only and decoder-only architectures. Unlike previous comparisons that were limited by different training data, architectures, and recipes, Ettin models use:
@@ -77,6 +94,9 @@ The training data is publicly available and split across different phases:
77
 
78
  ## Usage
79
 
 
 
 
80
  ### Encoder Models (Classification/Retrieval/MLM)
81
 
82
  ```python
@@ -128,6 +148,11 @@ predictions = predict_masked_token(masked_text)
128
  print(f"Predictions: {predictions}")
129
  ```
130
 
 
 
 
 
 
131
  ### Decoder Models (Text Generation)
132
 
133
  ```python
@@ -164,6 +189,8 @@ generated = generate_text(prompt)
164
  print(generated)
165
  ```
166
 
 
 
167
  ## Training Details
168
 
169
  **Data:** High-quality mixture including DCLM, Dolma v1.7, scientific papers, code, and curated sources totaling 2T+ tokens
 
4
  - en
5
  pipeline_tag: fill-mask
6
  ---
7
+ # Ettin: Open Suite of Paired Encoders and Decoders
8
+
9
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
10
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-red)](https://arxiv.org/abs/XXXX.XXXXX)
11
+ [![Models](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Models-blue)](https://huggingface.co/jhu-clsp)
12
+ [![Data](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Data-green)](https://huggingface.co/datasets/jhu-clsp)
13
 
14
  πŸ“„ [Paper](https://arxiv.org/abs/XXXX.XXXXX) | πŸš€ [GitHub Repository](https://github.com/jhu-clsp/ettin-encoder-vs-decoder)
15
 
16
  This model is part of the Ettin suite - the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.
17
 
18
+ ## Table of Contents
19
+ - [Model Description](#model-description)
20
+ - [Training Data](#training-data)
21
+ - [Model Family](#model-family)
22
+ - [Encoder Models](#encoder-models)
23
+ - [Decoder Models](#decoder-models)
24
+ - [Cross-Objective Models](#cross-objective-models)
25
+ - [Usage](#usage)
26
+ - [Training Details](#training-details)
27
+ - [Model Architecture](#model-architecture)
28
+ - [Citation](#citation)
29
+
30
  ## Model Description
31
 
32
  Ettin models are designed to provide a foundation for comparing encoder-only and decoder-only architectures. Unlike previous comparisons that were limited by different training data, architectures, and recipes, Ettin models use:
 
94
 
95
  ## Usage
96
 
97
+ <details>
98
+ <summary>πŸš€ <strong>Click to expand encoder usage examples</strong></summary>
99
+
100
  ### Encoder Models (Classification/Retrieval/MLM)
101
 
102
  ```python
 
148
  print(f"Predictions: {predictions}")
149
  ```
150
 
151
+ </details>
152
+
153
+ <details>
154
+ <summary>πŸš€ <strong>Click to expand decoder usage examples</strong></summary>
155
+
156
  ### Decoder Models (Text Generation)
157
 
158
  ```python
 
189
  print(generated)
190
  ```
191
 
192
+ </details>
193
+
194
  ## Training Details
195
 
196
  **Data:** High-quality mixture including DCLM, Dolma v1.7, scientific papers, code, and curated sources totaling 2T+ tokens