Feature Extraction
Transformers
Safetensors
ModularStarEncoder
custom_code
andreagurioli1995 commited on
Commit
13ccddb
·
verified ·
1 Parent(s): 4bc5dc5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -20
README.md CHANGED
@@ -1,9 +1,9 @@
1
- ---
2
- library_name: transformers
3
- datasets:
4
- - bigcode/the-stack-v2
5
- license: bigcode-openrail-m
6
- ---
7
 
8
  # Model Card for Model ID
9
 
@@ -29,24 +29,17 @@ from transformers import AutoModel
29
  from transformers import AutoTokenizer
30
 
31
  #import the model
32
- model = AutoModel.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned", trust_remote_code=True)
33
 
34
  #import the tokenizer
35
- tokenizer = AutoTokenizer.from_pretrained("andreagurioli1995/ModularStarEncoder-finetuned")
36
 
37
 
38
- language = "yourlanguagelowercased"
39
-
40
- #instruction in case of code embedding in a code language
41
- instruction_code = f"Represent this {language} code snippet for retrieval:"
42
-
43
- #instruction in case of code embedding in English
44
- instruction_natural_language = "Represent this code description for retrieving supporting snippets of code:"
45
 
46
  code_snippet = "your code to embed here"
47
 
48
- #You should follow this pattern to embed a snippet of code or natural language queries
49
- sentence = f"{tokenizer.sep_token}{instruction_code}{tokenizer.sep_token}{code_snippet)}{tokenizer.cls_token}"
50
 
51
  #Tokenizing your sentence
52
  tokenized_sensence = tokenizer(sentence, return_tensors="pt",truncation=True, max_length=2048)
@@ -55,10 +48,13 @@ tokenized_sensence = tokenizer(sentence, return_tensors="pt",truncation=True, ma
55
  embedded_sentence = model(**sentence)
56
  ```
57
 
58
- You will get as an output three elements:
59
 
60
- - projected_pooled_normalized: a list of the projected, pooled, and normalized embeddings from the five exit points;
61
- - raw_hidden_states: raw representation from all the hidden states of the model, without pooling, normalization, and projection
 
 
 
62
  - attentions: attention scores from the encoder
63
 
64
  ### Model Description
 
1
+ ---
2
+ library_name: transformers
3
+ datasets:
4
+ - bigcode/the-stack-v2
5
+ license: bigcode-openrail-m
6
+ ---
7
 
8
  # Model Card for Model ID
9
 
 
29
  from transformers import AutoTokenizer
30
 
31
  #import the model
32
+ model = AutoModel.from_pretrained("andreagurioli1995/ModularStarEncoder", trust_remote_code=True)
33
 
34
  #import the tokenizer
35
+ tokenizer = AutoTokenizer.from_pretrained("andreagurioli1995/ModularStarEncoder")
36
 
37
 
 
 
 
 
 
 
 
38
 
39
  code_snippet = "your code to embed here"
40
 
41
+ #You should follow this pattern to embed a snippet of code
42
+ sentence = f"{tokenizer.sep_token}{code_snippet}{tokenizer.cls_token}
43
 
44
  #Tokenizing your sentence
45
  tokenized_sensence = tokenizer(sentence, return_tensors="pt",truncation=True, max_length=2048)
 
48
  embedded_sentence = model(**sentence)
49
  ```
50
 
51
+ You will get as an output six elements:
52
 
53
+ - last_hidden_state: the representation of the last hidden state from the model;
54
+ - hidden_states: raw representation from all the hidden states of the model, without pooling, normalization, and projection
55
+ - loss: loss value if a ground truth is given (None if used in inference)
56
+ - prediction_logits: prediction scores from masked language modeling head
57
+ - seq_relationship_scores: prediction scores of in-context loss (concatenate multiple samples with the separator token if you want a meaningful score)
58
  - attentions: attention scores from the encoder
59
 
60
  ### Model Description