jhu-clsp
/

ettin-encoder-17m

@@ -4,12 +4,29 @@ language:
 - en
 pipeline_tag: fill-mask
 ---
-# Ettin: an Open Suite of Paired Encoders and Decoders
 📄 [Paper](https://arxiv.org/abs/XXXX.XXXXX) | 🚀 [GitHub Repository](https://github.com/jhu-clsp/ettin-encoder-vs-decoder)
 This model is part of the Ettin suite - the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.
 ## Model Description
 Ettin models are designed to provide a foundation for comparing encoder-only and decoder-only architectures. Unlike previous comparisons that were limited by different training data, architectures, and recipes, Ettin models use:
@@ -77,6 +94,9 @@ The training data is publicly available and split across different phases:
 ## Usage
 ### Encoder Models (Classification/Retrieval/MLM)
 ```python
@@ -128,6 +148,11 @@ predictions = predict_masked_token(masked_text)
 print(f"Predictions: {predictions}")
 ```
 ### Decoder Models (Text Generation)
 ```python
@@ -164,6 +189,8 @@ generated = generate_text(prompt)
 print(generated)
 ```
 ## Training Details
 **Data:** High-quality mixture including DCLM, Dolma v1.7, scientific papers, code, and curated sources totaling 2T+ tokens

 - en
 pipeline_tag: fill-mask
 ---
+# Ettin: Open Suite of Paired Encoders and Decoders
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Paper](https://img.shields.io/badge/Paper-arXiv-red)](https://arxiv.org/abs/XXXX.XXXXX)
+[![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-blue)](https://huggingface.co/jhu-clsp)
+[![Data](https://img.shields.io/badge/🤗%20Hugging%20Face-Data-green)](https://huggingface.co/datasets/jhu-clsp)
 📄 [Paper](https://arxiv.org/abs/XXXX.XXXXX) | 🚀 [GitHub Repository](https://github.com/jhu-clsp/ettin-encoder-vs-decoder)
 This model is part of the Ettin suite - the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.
+## Table of Contents
+- [Model Description](#model-description)
+- [Training Data](#training-data)
+- [Model Family](#model-family)
+  - [Encoder Models](#encoder-models)
+  - [Decoder Models](#decoder-models)
+  - [Cross-Objective Models](#cross-objective-models)
+- [Usage](#usage)
+- [Training Details](#training-details)
+- [Model Architecture](#model-architecture)
+- [Citation](#citation)
 ## Model Description
 Ettin models are designed to provide a foundation for comparing encoder-only and decoder-only architectures. Unlike previous comparisons that were limited by different training data, architectures, and recipes, Ettin models use:
 ## Usage
+<details>
+<summary>🚀 <strong>Click to expand encoder usage examples</strong></summary>
 ### Encoder Models (Classification/Retrieval/MLM)
 ```python
 print(f"Predictions: {predictions}")
 ```
+</details>
+<details>
+<summary>🚀 <strong>Click to expand decoder usage examples</strong></summary>
 ### Decoder Models (Text Generation)
 ```python
 print(generated)
 ```
+</details>
 ## Training Details
 **Data:** High-quality mixture including DCLM, Dolma v1.7, scientific papers, code, and curated sources totaling 2T+ tokens