seongwoon commited on
Commit
39ff6ff
·
1 Parent(s): 5cf249c
Files changed (1) hide show
  1. README.md +74 -31
README.md CHANGED
@@ -1,51 +1,94 @@
1
  ---
2
- license: apache-2.0
3
  tags:
4
- - generated_from_trainer
5
- model-index:
6
- - name: paper_embedding
7
- results: []
 
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- # paper_embedding
 
14
 
15
- This model is a fine-tuned version of [allenai/specter2](https://huggingface.co/allenai/specter2) on the None dataset.
 
 
16
 
17
- ## Model description
 
18
 
19
- More information needed
 
 
20
 
21
- ## Intended uses & limitations
 
22
 
23
- More information needed
 
 
24
 
25
- ## Training and evaluation data
26
 
27
- More information needed
28
 
29
- ## Training procedure
30
 
31
- ### Training hyperparameters
32
 
33
- The following hyperparameters were used during training:
34
- - learning_rate: 3e-05
35
- - train_batch_size: 8
36
- - eval_batch_size: 8
37
- - seed: 42
38
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
39
- - lr_scheduler_type: linear
40
- - num_epochs: 10
41
 
42
- ### Training results
43
 
44
 
 
 
 
 
 
 
 
45
 
46
- ### Framework versions
47
 
48
- - Transformers 4.24.0
49
- - Pytorch 2.0.0+cu118
50
- - Datasets 2.8.0
51
- - Tokenizers 0.13.2
 
1
  ---
2
+ pipeline_tag: sentence-similarity
3
  tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ - transformers
8
+
9
  ---
10
 
11
+ # {MODEL_NAME}
12
+
13
+ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
+
15
+ <!--- Describe your model here -->
16
+
17
+ ## Usage (Sentence-Transformers)
18
+
19
+ Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
20
+
21
+ ```
22
+ pip install -U sentence-transformers
23
+ ```
24
+
25
+ Then you can use the model like this:
26
+
27
+ ```python
28
+ from sentence_transformers import SentenceTransformer
29
+ sentences = ["This is an example sentence", "Each sentence is converted"]
30
+
31
+ model = SentenceTransformer('{MODEL_NAME}')
32
+ embeddings = model.encode(sentences)
33
+ print(embeddings)
34
+ ```
35
+
36
+
37
+
38
+ ## Usage (HuggingFace Transformers)
39
+ Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
40
+
41
+ ```python
42
+ from transformers import AutoTokenizer, AutoModel
43
+ import torch
44
+
45
+
46
+ #Mean Pooling - Take attention mask into account for correct averaging
47
+ def mean_pooling(model_output, attention_mask):
48
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
49
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
50
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
51
+
52
 
53
+ # Sentences we want sentence embeddings for
54
+ sentences = ['This is an example sentence', 'Each sentence is converted']
55
 
56
+ # Load model from HuggingFace Hub
57
+ tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
58
+ model = AutoModel.from_pretrained('{MODEL_NAME}')
59
 
60
+ # Tokenize sentences
61
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
62
 
63
+ # Compute token embeddings
64
+ with torch.no_grad():
65
+ model_output = model(**encoded_input)
66
 
67
+ # Perform pooling. In this case, mean pooling.
68
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
69
 
70
+ print("Sentence embeddings:")
71
+ print(sentence_embeddings)
72
+ ```
73
 
 
74
 
 
75
 
76
+ ## Evaluation Results
77
 
78
+ <!--- Describe how your model was evaluated -->
79
 
80
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
 
 
 
 
 
 
 
81
 
 
82
 
83
 
84
+ ## Full Model Architecture
85
+ ```
86
+ SentenceTransformer(
87
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
88
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
89
+ )
90
+ ```
91
 
92
+ ## Citing & Authors
93
 
94
+ <!--- Describe where people can find more information -->