PeppoCola commited on
Commit
9edbac9
1 Parent(s): 6f42cdd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -8,11 +8,15 @@ tags:
8
 
9
  ---
10
 
11
- # {MODEL_NAME}
12
 
13
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 
 
 
 
 
14
 
15
- <!--- Describe your model here -->
16
 
17
  ## Usage (Sentence-Transformers)
18
 
@@ -54,8 +58,8 @@ def mean_pooling(model_output, attention_mask):
54
  sentences = ['This is an example sentence', 'Each sentence is converted']
55
 
56
  # Load model from HuggingFace Hub
57
- tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
58
- model = AutoModel.from_pretrained('{MODEL_NAME}')
59
 
60
  # Tokenize sentences
61
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
8
 
9
  ---
10
 
11
+ # GitHub Issues Preprocessed MPNet Sentence Transformer (10 Epochs)
12
 
13
+ This is a [sentence-transformers](https://www.SBERT.net) model, specific for GitHub Issue data.
14
+
15
+ ## Dataset
16
+
17
+ For training, we used the [NLBSE22 dataset](https://nlbse2022.github.io/tools/), after removing issues with empty body and duplicates.
18
+ Similarity between title and body was used to train the sentence embedding model.
19
 
 
20
 
21
  ## Usage (Sentence-Transformers)
22
 
 
58
  sentences = ['This is an example sentence', 'Each sentence is converted']
59
 
60
  # Load model from HuggingFace Hub
61
+ tokenizer = AutoTokenizer.from_pretrained('Collab-uniba/github-issues-preprocessed-mpnet-st-e10')
62
+ model = AutoModel.from_pretrained('Collab-uniba/github-issues-preprocessed-mpnet-st-e10')
63
 
64
  # Tokenize sentences
65
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')