Update README.md
Browse filesAdd technical report link
README.md
CHANGED
|
@@ -113,8 +113,8 @@ by [TensorOpera AI](https://tensoropera.ai/). The model was trained with a 3-sta
|
|
| 113 |
tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and
|
| 114 |
16 attention heads for faster inference.
|
| 115 |
|
| 116 |
-
For the full details of this model please read
|
| 117 |
-
|
| 118 |
|
| 119 |
## Benchmarks
|
| 120 |
|
|
|
|
| 113 |
tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and
|
| 114 |
16 attention heads for faster inference.
|
| 115 |
|
| 116 |
+
For the full details of this model please read [Fox-1 technical report](https://arxiv.org/abs/2411.05281)
|
| 117 |
+
and [release blog post](https://blog.tensoropera.ai/tensoropera-unveils-fox-foundation-model-a-pioneering-open-source-slm-leading-the-way-against-tech-giants).
|
| 118 |
|
| 119 |
## Benchmarks
|
| 120 |
|