Fill-Mask
Transformers.js
ONNX
bert
Xenova HF Staff commited on
Commit
4ae43fa
·
verified ·
1 Parent(s): 6de5363

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +198 -2
README.md CHANGED
@@ -1,9 +1,205 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers.js
3
  base_model:
4
  - google-bert/bert-base-multilingual-uncased
5
  ---
6
 
7
- # bert-base-multilingual-uncased (ONNX)
8
 
9
- This is an ONNX version of [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased). It was automatically converted and uploaded using [this space](https://huggingface.co/spaces/onnx-community/convert-to-onnx).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - multilingual
4
+ - af
5
+ - sq
6
+ - ar
7
+ - an
8
+ - hy
9
+ - ast
10
+ - az
11
+ - ba
12
+ - eu
13
+ - bar
14
+ - be
15
+ - bn
16
+ - inc
17
+ - bs
18
+ - br
19
+ - bg
20
+ - my
21
+ - ca
22
+ - ceb
23
+ - ce
24
+ - zh
25
+ - cv
26
+ - hr
27
+ - cs
28
+ - da
29
+ - nl
30
+ - en
31
+ - et
32
+ - fi
33
+ - fr
34
+ - gl
35
+ - ka
36
+ - de
37
+ - el
38
+ - gu
39
+ - ht
40
+ - he
41
+ - hi
42
+ - hu
43
+ - is
44
+ - io
45
+ - id
46
+ - ga
47
+ - it
48
+ - ja
49
+ - jv
50
+ - kn
51
+ - kk
52
+ - ky
53
+ - ko
54
+ - la
55
+ - lv
56
+ - lt
57
+ - roa
58
+ - nds
59
+ - lm
60
+ - mk
61
+ - mg
62
+ - ms
63
+ - ml
64
+ - mr
65
+ - min
66
+ - ne
67
+ - new
68
+ - nb
69
+ - nn
70
+ - oc
71
+ - fa
72
+ - pms
73
+ - pl
74
+ - pt
75
+ - pa
76
+ - ro
77
+ - ru
78
+ - sco
79
+ - sr
80
+ - hr
81
+ - scn
82
+ - sk
83
+ - sl
84
+ - aze
85
+ - es
86
+ - su
87
+ - sw
88
+ - sv
89
+ - tl
90
+ - tg
91
+ - ta
92
+ - tt
93
+ - te
94
+ - tr
95
+ - uk
96
+ - ud
97
+ - uz
98
+ - vi
99
+ - vo
100
+ - war
101
+ - cy
102
+ - fry
103
+ - pnb
104
+ - yo
105
+ license: apache-2.0
106
+ datasets:
107
+ - wikipedia
108
  library_name: transformers.js
109
  base_model:
110
  - google-bert/bert-base-multilingual-uncased
111
  ---
112
 
113
+ # BERT multilingual base model (uncased)
114
 
115
+ Pretrained model on the top 102 languages with the largest Wikipedia using a masked language modeling (MLM) objective.
116
+ It was introduced in [this paper](https://arxiv.org/abs/1810.04805) and first released in
117
+ [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
118
+ between english and English.
119
+
120
+ Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
121
+ the Hugging Face team.
122
+
123
+ ## Model description
124
+
125
+ BERT is a transformers model pretrained on a large corpus of multilingual data in a self-supervised fashion. This means
126
+ it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
127
+ publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
128
+ was pretrained with two objectives:
129
+
130
+ - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
131
+ the entire masked sentence through the model and has to predict the masked words. This is different from traditional
132
+ recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
133
+ GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the
134
+ sentence.
135
+ - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
136
+ they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
137
+ predict if the two sentences were following each other or not.
138
+
139
+ This way, the model learns an inner representation of the languages in the training set that can then be used to
140
+ extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a
141
+ standard classifier using the features produced by the BERT model as inputs.
142
+
143
+ ## Intended uses & limitations
144
+
145
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
146
+ be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
147
+ fine-tuned versions on a task that interests you.
148
+
149
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
150
+ to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
151
+ generation you should look at model like GPT2.
152
+
153
+ ### How to use
154
+
155
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
156
+ ```bash
157
+ npm i @huggingface/transformers
158
+ ```
159
+
160
+ You can then use this model directly with a pipeline for masked language modeling:
161
+
162
+ ```js
163
+ import { pipeline } from "@huggingface/transformers";
164
+
165
+ const unmasker = await pipeline("fill-mask", "onnx-community/bert-base-multilingual-uncased-ONNX");
166
+ const result = await unmasker("The capital of France is [MASK].", { top_k: 3 });
167
+ console.log(result);
168
+ // [
169
+ // { score: 0.2571190595626831, token: 10718, token_str: 'paris', sequence: 'the capital of france is paris.' },
170
+ // { score: 0.15583284199237823, token: 18254, token_str: 'lyon', sequence: 'the capital of france is lyon.' },
171
+ // { score: 0.06897224485874176, token: 25091, token_str: 'bordeaux', sequence: 'the capital of france is bordeaux.' }
172
+ // ]
173
+ ```
174
+
175
+ ### Limitations and bias
176
+
177
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
178
+ predictions. This bias will also affect all fine-tuned versions of this model.
179
+
180
+ ## Training data
181
+
182
+ The BERT model was pretrained on the 102 languages with the largest Wikipedias. You can find the complete list
183
+ [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages).
184
+
185
+ ### BibTeX entry and citation info
186
+
187
+ ```bibtex
188
+ @article{DBLP:journals/corr/abs-1810-04805,
189
+ author = {Jacob Devlin and
190
+ Ming{-}Wei Chang and
191
+ Kenton Lee and
192
+ Kristina Toutanova},
193
+ title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
194
+ Understanding},
195
+ journal = {CoRR},
196
+ volume = {abs/1810.04805},
197
+ year = {2018},
198
+ url = {http://arxiv.org/abs/1810.04805},
199
+ archivePrefix = {arXiv},
200
+ eprint = {1810.04805},
201
+ timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
202
+ biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
203
+ bibsource = {dblp computer science bibliography, https://dblp.org}
204
+ }
205
+ ```