Transformers
PyTorch
code
custom_code
Inference Endpoints
Tom Aarsen commited on
Commit
682e518
·
1 Parent(s): 9705163

Add Sentence Transformers support

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false
9
+ }
README.md CHANGED
@@ -3,6 +3,8 @@ license: apache-2.0
3
  datasets:
4
  - bigcode/the-stack-dedup
5
  library_name: transformers
 
 
6
  language:
7
  - code
8
  ---
@@ -24,6 +26,8 @@ This checkpoint is first trained on code data via masked language modeling (MLM)
24
  ### How to use
25
  This checkpoint consists of an encoder (356M model), which can be used to extract code embeddings of 1024 dimension. It can be easily loaded using the AutoModel functionality and employs the Starcoder tokenizer (https://arxiv.org/pdf/2305.06161.pdf).
26
 
 
 
27
  ```
28
  from transformers import AutoModel, AutoTokenizer
29
 
@@ -39,6 +43,21 @@ print(f'Dimension of the embedding: {embedding[0].size()}')
39
  # Dimension of the embedding: torch.Size([13, 1024])
40
  ```
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ### BibTeX entry and citation info
43
  ```
44
  @inproceedings{
 
3
  datasets:
4
  - bigcode/the-stack-dedup
5
  library_name: transformers
6
+ tags:
7
+ - sentence-transformers
8
  language:
9
  - code
10
  ---
 
26
  ### How to use
27
  This checkpoint consists of an encoder (356M model), which can be used to extract code embeddings of 1024 dimension. It can be easily loaded using the AutoModel functionality and employs the Starcoder tokenizer (https://arxiv.org/pdf/2305.06161.pdf).
28
 
29
+ ### Transformers
30
+
31
  ```
32
  from transformers import AutoModel, AutoTokenizer
33
 
 
43
  # Dimension of the embedding: torch.Size([13, 1024])
44
  ```
45
 
46
+ ### Sentence Transformers
47
+
48
+ ```
49
+ from sentence_transformers import SentenceTransformer
50
+
51
+ checkpoint = "codesage/codesage-base"
52
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
53
+
54
+ model = SentenceTransformer(checkpoint, device=device, trust_remote_code=True)
55
+
56
+ embedding = model.encode("def print_hello_world():\tprint('Hello World!')")
57
+ print(f'Dimension of the embedding: {embedding.size}')
58
+ # Dimension of the embedding: 1024
59
+ ```
60
+
61
  ### BibTeX entry and citation info
62
  ```
63
  @inproceedings{
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.4.0.dev0",
4
+ "transformers": "4.37.0",
5
+ "pytorch": "2.1.0+cu121"
6
+ }
7
+ }
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]