Feature Extraction
Transformers
Safetensors
ModularStarEncoder
custom_code
andreagurioli1995 commited on
Commit
4bc5dc5
·
verified ·
1 Parent(s): 354756f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -5
README.md CHANGED
@@ -1,8 +1,9 @@
1
- ---
2
- library_name: transformers
3
- datasets:
4
- - bigcode/the-stack-v2
5
- ---
 
6
 
7
  # Model Card for Model ID
8
 
@@ -22,8 +23,44 @@ Input should take this format when tokenized:
22
 
23
  f"{tokenizer.sep_token}{code_snippet}{tokenizer.cls_token}"
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
 
26
 
 
 
 
 
27
  ### Model Description
28
 
29
  <!-- Provide a longer summary of what this model is. -->
 
1
+ ---
2
+ library_name: transformers
3
+ datasets:
4
+ - bigcode/the-stack-v2
5
+ license: bigcode-openrail-m
6
+ ---
7
 
8
  # Model Card for Model ID
9
 
 
23
 
24
  f"{tokenizer.sep_token}{code_snippet}{tokenizer.cls_token}"
25
 
26
+ ### How to use
27
+ ```python
28
+ from transformers import AutoModel
29
+ from transformers import AutoTokenizer
30
+
31
+ #import the model
32
+ model = AutoModel.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned", trust_remote_code=True)
33
+
34
+ #import the tokenizer
35
+ tokenizer = AutoTokenizer.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned")
36
+
37
+
38
+ language = "yourlanguagelowercased"
39
+
40
+ #instruction in case of code embedding in a code language
41
+ instruction_code = f"Represent this {language} code snippet for retrieval:"
42
+
43
+ #instruction in case of code embedding in English
44
+ instruction_natural_language = "Represent this code description for retrieving supporting snippets of code:"
45
+
46
+ code_snippet = "your code to embed here"
47
+
48
+ #You should follow this pattern to embed a snippet of code or natural language queries
49
+ sentence = f"{tokenizer.sep_token}{instruction_code}{tokenizer.sep_token}{code_snippet)}{tokenizer.cls_token}"
50
+
51
+ #Tokenizing your sentence
52
+ tokenized_sensence = tokenizer(sentence, return_tensors="pt",truncation=True, max_length=2048)
53
+
54
+ #Embedding the tokenized sentence
55
+ embedded_sentence = model(**sentence)
56
+ ```
57
 
58
+ You will get as an output three elements:
59
 
60
+ - projected_pooled_normalized: a list of the projected, pooled, and normalized embeddings from the five exit points;
61
+ - raw_hidden_states: raw representation from all the hidden states of the model, without pooling, normalization, and projection
62
+ - attentions: attention scores from the encoder
63
+
64
  ### Model Description
65
 
66
  <!-- Provide a longer summary of what this model is. -->