mrscoopers commited on
Commit
a267ad7
·
verified ·
1 Parent(s): ebc6496

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -4
README.md CHANGED
@@ -8,16 +8,25 @@ pipeline_tag: sentence-similarity
8
 
9
  # MiniCOIL v1
10
 
11
- MiniCOIL - is a sparse contextualized per-token embeddings.
12
- Read more about it in [the article](https://qdrant.tech/articles/minicoil).
 
 
 
 
 
 
 
 
 
13
 
14
 
15
  ## Usage
16
 
17
- This model is designed to be used with [FastEmbed](https://github.com/qdrant/fastembed) library.
18
 
19
  > Note:
20
- This model is supposed to be used with Qdrant. Vectors have to be configured with [Modifier.IDF](https://qdrant.tech/documentation/concepts/indexing/?q=modifier#idf-modifier).
21
 
22
  ```py
23
  from fastembed import SparseTextEmbedding
 
8
 
9
  # MiniCOIL v1
10
 
11
+ MiniCOIL is a sparse neural embedding model for textual retrieval.
12
+
13
+ It creates 4-dimensional embeddings for each word stem, capturing the word's meaning.
14
+ These meaning embeddings are combined into a bag-of-words (BoW) representation of the input text.
15
+ The final sparse representation is calculated by weighting each word using the BM25 scoring formula.
16
+
17
+ <img src="https://storage.googleapis.com/qdrant-examples/miniCOIL_inference.png" alt="miniCOIL inference" width="600"/>
18
+
19
+ In the case of a word's absence in the miniCOIL vocabulary, word weight in sparse representation is purely based on the BM25 score.
20
+
21
+ Read more about miniCOIL in [the article](https://qdrant.tech/articles/minicoil).
22
 
23
 
24
  ## Usage
25
 
26
+ This model is designed to be used with the [FastEmbed](https://github.com/qdrant/fastembed) library.
27
 
28
  > Note:
29
+ This model was designed with Qdrant's specifics in mind; miniCOIL sparse vectors in Qdrant have to be configured with [Modifier.IDF](https://qdrant.tech/documentation/concepts/indexing/?q=modifier#idf-modifier). Otherwise, you'll have to personally calculate & scale the produced sparse representations by the IDF part of the BM25 formula.
30
 
31
  ```py
32
  from fastembed import SparseTextEmbedding