mpjan commited on
Commit
8cabd9a
·
1 Parent(s): 2e47f8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -1,17 +1,23 @@
1
  ---
2
  pipeline_tag: sentence-similarity
 
 
3
  tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
 
 
8
 
9
  ---
10
 
11
- # {MODEL_NAME}
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
 
 
15
  <!--- Describe your model here -->
16
 
17
  ## Usage (Sentence-Transformers)
@@ -28,7 +34,7 @@ Then you can use the model like this:
28
  from sentence_transformers import SentenceTransformer
29
  sentences = ["This is an example sentence", "Each sentence is converted"]
30
 
31
- model = SentenceTransformer('{MODEL_NAME}')
32
  embeddings = model.encode(sentences)
33
  print(embeddings)
34
  ```
@@ -51,8 +57,8 @@ def cls_pooling(model_output, attention_mask):
51
  sentences = ['This is an example sentence', 'Each sentence is converted']
52
 
53
  # Load model from HuggingFace Hub
54
- tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
55
- model = AutoModel.from_pretrained('{MODEL_NAME}')
56
 
57
  # Tokenize sentences
58
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
1
  ---
2
  pipeline_tag: sentence-similarity
3
+ language:
4
+ - 'pt'
5
  tags:
6
  - sentence-transformers
7
  - feature-extraction
8
  - sentence-similarity
9
  - transformers
10
+ datasets:
11
+ - 'unicamp-dl/mmarco'
12
 
13
  ---
14
 
15
+ # mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k
16
 
17
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
18
 
19
+ It is a fine-tuning of [sentence-transformers/msmarco-distilbert-base-tas-b](https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b) on the first 300k triplets of the Portuguese subset in [unicamp-dl/mmarco](https://huggingface.co/datasets/unicamp-dl/mmarco).
20
+
21
  <!--- Describe your model here -->
22
 
23
  ## Usage (Sentence-Transformers)
 
34
  from sentence_transformers import SentenceTransformer
35
  sentences = ["This is an example sentence", "Each sentence is converted"]
36
 
37
+ model = SentenceTransformer('mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k')
38
  embeddings = model.encode(sentences)
39
  print(embeddings)
40
  ```
 
57
  sentences = ['This is an example sentence', 'Each sentence is converted']
58
 
59
  # Load model from HuggingFace Hub
60
+ tokenizer = AutoTokenizer.from_pretrained('mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k')
61
+ model = AutoModel.from_pretrained('mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k')
62
 
63
  # Tokenize sentences
64
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')