Update README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,188 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: itc
|
| 3 |
+
tags:
|
| 4 |
+
- translation
|
| 5 |
+
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
### itc-itc
|
| 10 |
+
|
| 11 |
+
* source languages: itc
|
| 12 |
+
* target languages: itc
|
| 13 |
+
* OPUS readme: [itc-itc](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/itc-itc/README.md)
|
| 14 |
+
|
| 15 |
+
* dataset: opus
|
| 16 |
+
* model: transformer
|
| 17 |
+
* source language(s): arg ast bjn cat cos egl fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lat_Grek lat_Latn lij lld_Latn lmo mwl oci pap pcd pms por roh ron scn spa srd vec wln zsm_Latn
|
| 18 |
+
* target language(s): arg ast bjn cat cos egl fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lat_Grek lat_Latn lij lld_Latn lmo mwl oci pap pcd pms por roh ron scn spa srd vec wln zsm_Latn
|
| 19 |
+
* model: transformer
|
| 20 |
+
* pre-processing: normalization + SentencePiece (spm32k,spm32k)
|
| 21 |
+
* a sentence initial language token is required in the form of `>>id<<` (id = valid target language ID)
|
| 22 |
+
* download original weights: [opus-2020-07-07.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/itc-itc/opus-2020-07-07.zip)
|
| 23 |
+
* test set translations: [opus-2020-07-07.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/itc-itc/opus-2020-07-07.test.txt)
|
| 24 |
+
* test set scores: [opus-2020-07-07.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/itc-itc/opus-2020-07-07.eval.txt)
|
| 25 |
+
|
| 26 |
+
## Benchmarks
|
| 27 |
+
|
| 28 |
+
| testset | BLEU | chr-F |
|
| 29 |
+
|-----------------------|-------|-------|
|
| 30 |
+
| Tatoeba-test.arg-fra.arg.fra | 40.8 | 0.501 |
|
| 31 |
+
| Tatoeba-test.arg-spa.arg.spa | 59.9 | 0.739 |
|
| 32 |
+
| Tatoeba-test.ast-fra.ast.fra | 45.4 | 0.628 |
|
| 33 |
+
| Tatoeba-test.ast-por.ast.por | 100.0 | 1.000 |
|
| 34 |
+
| Tatoeba-test.ast-spa.ast.spa | 46.8 | 0.636 |
|
| 35 |
+
| Tatoeba-test.cat-fra.cat.fra | 51.6 | 0.689 |
|
| 36 |
+
| Tatoeba-test.cat-ita.cat.ita | 49.2 | 0.699 |
|
| 37 |
+
| Tatoeba-test.cat-por.cat.por | 48.0 | 0.688 |
|
| 38 |
+
| Tatoeba-test.cat-ron.cat.ron | 35.4 | 0.719 |
|
| 39 |
+
| Tatoeba-test.cat-spa.cat.spa | 69.0 | 0.826 |
|
| 40 |
+
| Tatoeba-test.cos-fra.cos.fra | 22.3 | 0.383 |
|
| 41 |
+
| Tatoeba-test.cos-pms.cos.pms | 3.4 | 0.199 |
|
| 42 |
+
| Tatoeba-test.egl-fra.egl.fra | 9.5 | 0.283 |
|
| 43 |
+
| Tatoeba-test.egl-ita.egl.ita | 3.0 | 0.206 |
|
| 44 |
+
| Tatoeba-test.egl-spa.egl.spa | 3.7 | 0.194 |
|
| 45 |
+
| Tatoeba-test.fra-arg.fra.arg | 3.8 | 0.090 |
|
| 46 |
+
| Tatoeba-test.fra-ast.fra.ast | 25.9 | 0.457 |
|
| 47 |
+
| Tatoeba-test.fra-cat.fra.cat | 42.2 | 0.637 |
|
| 48 |
+
| Tatoeba-test.fra-cos.fra.cos | 3.3 | 0.185 |
|
| 49 |
+
| Tatoeba-test.fra-egl.fra.egl | 2.2 | 0.120 |
|
| 50 |
+
| Tatoeba-test.fra-frm.fra.frm | 1.0 | 0.191 |
|
| 51 |
+
| Tatoeba-test.fra-gcf.fra.gcf | 0.2 | 0.099 |
|
| 52 |
+
| Tatoeba-test.fra-glg.fra.glg | 40.5 | 0.625 |
|
| 53 |
+
| Tatoeba-test.fra-hat.fra.hat | 22.6 | 0.472 |
|
| 54 |
+
| Tatoeba-test.fra-ita.fra.ita | 46.7 | 0.679 |
|
| 55 |
+
| Tatoeba-test.fra-lad.fra.lad | 15.9 | 0.345 |
|
| 56 |
+
| Tatoeba-test.fra-lat.fra.lat | 2.9 | 0.247 |
|
| 57 |
+
| Tatoeba-test.fra-lij.fra.lij | 1.0 | 0.201 |
|
| 58 |
+
| Tatoeba-test.fra-lld.fra.lld | 1.1 | 0.257 |
|
| 59 |
+
| Tatoeba-test.fra-lmo.fra.lmo | 1.2 | 0.241 |
|
| 60 |
+
| Tatoeba-test.fra-msa.fra.msa | 0.4 | 0.111 |
|
| 61 |
+
| Tatoeba-test.fra-oci.fra.oci | 7.3 | 0.322 |
|
| 62 |
+
| Tatoeba-test.fra-pap.fra.pap | 69.8 | 0.912 |
|
| 63 |
+
| Tatoeba-test.fra-pcd.fra.pcd | 0.6 | 0.144 |
|
| 64 |
+
| Tatoeba-test.fra-pms.fra.pms | 1.0 | 0.181 |
|
| 65 |
+
| Tatoeba-test.fra-por.fra.por | 39.7 | 0.619 |
|
| 66 |
+
| Tatoeba-test.fra-roh.fra.roh | 5.7 | 0.286 |
|
| 67 |
+
| Tatoeba-test.fra-ron.fra.ron | 36.4 | 0.591 |
|
| 68 |
+
| Tatoeba-test.fra-scn.fra.scn | 2.1 | 0.101 |
|
| 69 |
+
| Tatoeba-test.fra-spa.fra.spa | 47.5 | 0.670 |
|
| 70 |
+
| Tatoeba-test.fra-srd.fra.srd | 2.8 | 0.306 |
|
| 71 |
+
| Tatoeba-test.fra-vec.fra.vec | 3.0 | 0.345 |
|
| 72 |
+
| Tatoeba-test.fra-wln.fra.wln | 3.5 | 0.212 |
|
| 73 |
+
| Tatoeba-test.frm-fra.frm.fra | 11.4 | 0.472 |
|
| 74 |
+
| Tatoeba-test.gcf-fra.gcf.fra | 7.1 | 0.267 |
|
| 75 |
+
| Tatoeba-test.gcf-lad.gcf.lad | 0.0 | 0.170 |
|
| 76 |
+
| Tatoeba-test.gcf-por.gcf.por | 0.0 | 0.230 |
|
| 77 |
+
| Tatoeba-test.gcf-spa.gcf.spa | 13.4 | 0.314 |
|
| 78 |
+
| Tatoeba-test.glg-fra.glg.fra | 54.7 | 0.702 |
|
| 79 |
+
| Tatoeba-test.glg-ita.glg.ita | 40.1 | 0.661 |
|
| 80 |
+
| Tatoeba-test.glg-por.glg.por | 57.6 | 0.748 |
|
| 81 |
+
| Tatoeba-test.glg-spa.glg.spa | 70.0 | 0.817 |
|
| 82 |
+
| Tatoeba-test.hat-fra.hat.fra | 14.2 | 0.419 |
|
| 83 |
+
| Tatoeba-test.hat-spa.hat.spa | 17.9 | 0.449 |
|
| 84 |
+
| Tatoeba-test.ita-cat.ita.cat | 51.0 | 0.693 |
|
| 85 |
+
| Tatoeba-test.ita-egl.ita.egl | 1.1 | 0.114 |
|
| 86 |
+
| Tatoeba-test.ita-fra.ita.fra | 58.2 | 0.727 |
|
| 87 |
+
| Tatoeba-test.ita-glg.ita.glg | 41.7 | 0.652 |
|
| 88 |
+
| Tatoeba-test.ita-lad.ita.lad | 17.5 | 0.419 |
|
| 89 |
+
| Tatoeba-test.ita-lat.ita.lat | 7.1 | 0.294 |
|
| 90 |
+
| Tatoeba-test.ita-lij.ita.lij | 1.0 | 0.208 |
|
| 91 |
+
| Tatoeba-test.ita-msa.ita.msa | 0.9 | 0.115 |
|
| 92 |
+
| Tatoeba-test.ita-oci.ita.oci | 12.3 | 0.378 |
|
| 93 |
+
| Tatoeba-test.ita-pms.ita.pms | 1.6 | 0.182 |
|
| 94 |
+
| Tatoeba-test.ita-por.ita.por | 44.8 | 0.665 |
|
| 95 |
+
| Tatoeba-test.ita-ron.ita.ron | 43.3 | 0.653 |
|
| 96 |
+
| Tatoeba-test.ita-spa.ita.spa | 56.6 | 0.733 |
|
| 97 |
+
| Tatoeba-test.ita-vec.ita.vec | 2.0 | 0.187 |
|
| 98 |
+
| Tatoeba-test.lad-fra.lad.fra | 30.4 | 0.458 |
|
| 99 |
+
| Tatoeba-test.lad-gcf.lad.gcf | 0.0 | 0.163 |
|
| 100 |
+
| Tatoeba-test.lad-ita.lad.ita | 12.3 | 0.426 |
|
| 101 |
+
| Tatoeba-test.lad-lat.lad.lat | 1.6 | 0.178 |
|
| 102 |
+
| Tatoeba-test.lad-por.lad.por | 8.8 | 0.394 |
|
| 103 |
+
| Tatoeba-test.lad-ron.lad.ron | 78.3 | 0.717 |
|
| 104 |
+
| Tatoeba-test.lad-spa.lad.spa | 28.3 | 0.531 |
|
| 105 |
+
| Tatoeba-test.lat-fra.lat.fra | 9.4 | 0.300 |
|
| 106 |
+
| Tatoeba-test.lat-ita.lat.ita | 20.0 | 0.421 |
|
| 107 |
+
| Tatoeba-test.lat-lad.lat.lad | 3.8 | 0.173 |
|
| 108 |
+
| Tatoeba-test.lat-por.lat.por | 13.0 | 0.354 |
|
| 109 |
+
| Tatoeba-test.lat-ron.lat.ron | 14.0 | 0.358 |
|
| 110 |
+
| Tatoeba-test.lat-spa.lat.spa | 21.8 | 0.436 |
|
| 111 |
+
| Tatoeba-test.lij-fra.lij.fra | 13.8 | 0.346 |
|
| 112 |
+
| Tatoeba-test.lij-ita.lij.ita | 14.7 | 0.442 |
|
| 113 |
+
| Tatoeba-test.lld-fra.lld.fra | 18.8 | 0.428 |
|
| 114 |
+
| Tatoeba-test.lld-spa.lld.spa | 11.1 | 0.377 |
|
| 115 |
+
| Tatoeba-test.lmo-fra.lmo.fra | 11.0 | 0.329 |
|
| 116 |
+
| Tatoeba-test.msa-fra.msa.fra | 0.8 | 0.129 |
|
| 117 |
+
| Tatoeba-test.msa-ita.msa.ita | 1.1 | 0.138 |
|
| 118 |
+
| Tatoeba-test.msa-msa.msa.msa | 19.1 | 0.453 |
|
| 119 |
+
| Tatoeba-test.msa-pap.msa.pap | 0.0 | 0.037 |
|
| 120 |
+
| Tatoeba-test.msa-por.msa.por | 2.4 | 0.155 |
|
| 121 |
+
| Tatoeba-test.msa-ron.msa.ron | 1.2 | 0.129 |
|
| 122 |
+
| Tatoeba-test.msa-spa.msa.spa | 1.0 | 0.139 |
|
| 123 |
+
| Tatoeba-test.multi.multi | 40.8 | 0.599 |
|
| 124 |
+
| Tatoeba-test.mwl-por.mwl.por | 35.4 | 0.561 |
|
| 125 |
+
| Tatoeba-test.oci-fra.oci.fra | 24.5 | 0.467 |
|
| 126 |
+
| Tatoeba-test.oci-ita.oci.ita | 23.3 | 0.493 |
|
| 127 |
+
| Tatoeba-test.oci-spa.oci.spa | 26.1 | 0.505 |
|
| 128 |
+
| Tatoeba-test.pap-fra.pap.fra | 31.0 | 0.629 |
|
| 129 |
+
| Tatoeba-test.pap-msa.pap.msa | 0.0 | 0.051 |
|
| 130 |
+
| Tatoeba-test.pcd-fra.pcd.fra | 13.8 | 0.381 |
|
| 131 |
+
| Tatoeba-test.pcd-spa.pcd.spa | 2.6 | 0.227 |
|
| 132 |
+
| Tatoeba-test.pms-cos.pms.cos | 3.4 | 0.217 |
|
| 133 |
+
| Tatoeba-test.pms-fra.pms.fra | 13.4 | 0.347 |
|
| 134 |
+
| Tatoeba-test.pms-ita.pms.ita | 13.0 | 0.373 |
|
| 135 |
+
| Tatoeba-test.pms-spa.pms.spa | 13.1 | 0.374 |
|
| 136 |
+
| Tatoeba-test.por-ast.por.ast | 100.0 | 1.000 |
|
| 137 |
+
| Tatoeba-test.por-cat.por.cat | 45.1 | 0.673 |
|
| 138 |
+
| Tatoeba-test.por-fra.por.fra | 52.5 | 0.698 |
|
| 139 |
+
| Tatoeba-test.por-gcf.por.gcf | 16.0 | 0.128 |
|
| 140 |
+
| Tatoeba-test.por-glg.por.glg | 57.5 | 0.750 |
|
| 141 |
+
| Tatoeba-test.por-ita.por.ita | 50.1 | 0.710 |
|
| 142 |
+
| Tatoeba-test.por-lad.por.lad | 15.7 | 0.341 |
|
| 143 |
+
| Tatoeba-test.por-lat.por.lat | 11.1 | 0.362 |
|
| 144 |
+
| Tatoeba-test.por-msa.por.msa | 2.4 | 0.136 |
|
| 145 |
+
| Tatoeba-test.por-mwl.por.mwl | 30.5 | 0.559 |
|
| 146 |
+
| Tatoeba-test.por-roh.por.roh | 0.0 | 0.132 |
|
| 147 |
+
| Tatoeba-test.por-ron.por.ron | 40.0 | 0.632 |
|
| 148 |
+
| Tatoeba-test.por-spa.por.spa | 58.6 | 0.756 |
|
| 149 |
+
| Tatoeba-test.roh-fra.roh.fra | 23.1 | 0.564 |
|
| 150 |
+
| Tatoeba-test.roh-por.roh.por | 21.4 | 0.347 |
|
| 151 |
+
| Tatoeba-test.roh-spa.roh.spa | 19.8 | 0.489 |
|
| 152 |
+
| Tatoeba-test.ron-cat.ron.cat | 59.5 | 0.854 |
|
| 153 |
+
| Tatoeba-test.ron-fra.ron.fra | 47.4 | 0.647 |
|
| 154 |
+
| Tatoeba-test.ron-ita.ron.ita | 45.7 | 0.683 |
|
| 155 |
+
| Tatoeba-test.ron-lad.ron.lad | 44.2 | 0.712 |
|
| 156 |
+
| Tatoeba-test.ron-lat.ron.lat | 14.8 | 0.449 |
|
| 157 |
+
| Tatoeba-test.ron-msa.ron.msa | 1.2 | 0.098 |
|
| 158 |
+
| Tatoeba-test.ron-por.ron.por | 42.7 | 0.650 |
|
| 159 |
+
| Tatoeba-test.ron-spa.ron.spa | 50.4 | 0.686 |
|
| 160 |
+
| Tatoeba-test.scn-fra.scn.fra | 2.4 | 0.180 |
|
| 161 |
+
| Tatoeba-test.scn-spa.scn.spa | 5.1 | 0.212 |
|
| 162 |
+
| Tatoeba-test.spa-arg.spa.arg | 10.8 | 0.267 |
|
| 163 |
+
| Tatoeba-test.spa-ast.spa.ast | 24.6 | 0.514 |
|
| 164 |
+
| Tatoeba-test.spa-cat.spa.cat | 61.6 | 0.783 |
|
| 165 |
+
| Tatoeba-test.spa-egl.spa.egl | 2.2 | 0.106 |
|
| 166 |
+
| Tatoeba-test.spa-fra.spa.fra | 51.1 | 0.683 |
|
| 167 |
+
| Tatoeba-test.spa-gcf.spa.gcf | 7.8 | 0.067 |
|
| 168 |
+
| Tatoeba-test.spa-glg.spa.glg | 62.8 | 0.776 |
|
| 169 |
+
| Tatoeba-test.spa-hat.spa.hat | 16.6 | 0.398 |
|
| 170 |
+
| Tatoeba-test.spa-ita.spa.ita | 51.8 | 0.718 |
|
| 171 |
+
| Tatoeba-test.spa-lad.spa.lad | 14.6 | 0.393 |
|
| 172 |
+
| Tatoeba-test.spa-lat.spa.lat | 21.5 | 0.486 |
|
| 173 |
+
| Tatoeba-test.spa-lld.spa.lld | 2.0 | 0.222 |
|
| 174 |
+
| Tatoeba-test.spa-msa.spa.msa | 0.8 | 0.113 |
|
| 175 |
+
| Tatoeba-test.spa-oci.spa.oci | 10.3 | 0.377 |
|
| 176 |
+
| Tatoeba-test.spa-pcd.spa.pcd | 0.9 | 0.115 |
|
| 177 |
+
| Tatoeba-test.spa-pms.spa.pms | 1.5 | 0.194 |
|
| 178 |
+
| Tatoeba-test.spa-por.spa.por | 49.4 | 0.698 |
|
| 179 |
+
| Tatoeba-test.spa-roh.spa.roh | 4.6 | 0.261 |
|
| 180 |
+
| Tatoeba-test.spa-ron.spa.ron | 39.1 | 0.618 |
|
| 181 |
+
| Tatoeba-test.spa-scn.spa.scn | 2.0 | 0.113 |
|
| 182 |
+
| Tatoeba-test.spa-wln.spa.wln | 8.7 | 0.295 |
|
| 183 |
+
| Tatoeba-test.srd-fra.srd.fra | 6.7 | 0.369 |
|
| 184 |
+
| Tatoeba-test.vec-fra.vec.fra | 59.9 | 0.608 |
|
| 185 |
+
| Tatoeba-test.vec-ita.vec.ita | 14.2 | 0.405 |
|
| 186 |
+
| Tatoeba-test.wln-fra.wln.fra | 8.9 | 0.344 |
|
| 187 |
+
| Tatoeba-test.wln-spa.wln.spa | 9.6 | 0.298 |
|
| 188 |
+
|