Fill-Mask
Transformers
PyTorch
bert
yyu commited on
Commit
9b2a48c
·
1 Parent(s): c26ee12

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -7,5 +7,36 @@ This model has been first pretrained on the BEIR corpus and fine-tuned on MS MAR
7
  This model is trained with BERT-base as the backbone with 110M hyperparameters.
8
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
 
 
7
  This model is trained with BERT-base as the backbone with 110M hyperparameters.
8
 
9
 
10
+ ## Usage
11
+
12
+ Pre-trained models can be loaded through the HuggingFace transformers library:
13
+
14
+ ```python
15
+ from transformers import AutoModel, AutoTokenizer
16
+
17
+ model = AutoModel.from_pretrained("OpenMatch/cocodr-base-msmarco")
18
+ tokenizer = AutoTokenizer.from_pretrained("OpenMatch/cocodr-base-msmarco")
19
+ ```
20
+
21
+ Then embeddings for different sentences can be obtained by doing the following:
22
+
23
+ ```python
24
+
25
+ sentences = [
26
+ "Where was Marie Curie born?",
27
+ "Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.",
28
+ "Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."
29
+ ]
30
+
31
+ inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
32
+ embeddings = model(**inputs, output_hidden_states=True, return_dict=True).hidden_states[-1][:, :1].squeeze(1) # the embedding of the [CLS] token after the final layer
33
+ ```
34
+
35
+ Then similarity scores between the different sentences are obtained with a dot product between the embeddings:
36
+ ```python
37
+
38
+ score01 = embeddings[0] @ embeddings[1] # 216.9792
39
+ score02 = embeddings[0] @ embeddings[2] # 216.6684
40
+ ```
41
 
42