sagard21 commited on
Commit
efd03f1
·
1 Parent(s): 633ef02

Update README.md

Browse files

Update model description

Files changed (1) hide show
  1. README.md +45 -10
README.md CHANGED
@@ -3,13 +3,15 @@ tags:
3
  - autotrain
4
  - summarization
5
  language:
6
- - unk
7
  widget:
8
- - text: "I love AutoTrain 🤗"
9
  datasets:
10
  - sagard21/autotrain-data-code-explainer
11
  co2_eq_emissions:
12
  emissions: 5.393079045128973
 
 
13
  ---
14
 
15
  # Model Trained Using AutoTrain
@@ -18,6 +20,47 @@ co2_eq_emissions:
18
  - Model ID: 2745581349
19
  - CO2 Emissions (in grams): 5.3931
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Validation Metrics
22
 
23
  - Loss: 2.156
@@ -26,11 +69,3 @@ co2_eq_emissions:
26
  - RougeL: 25.445
27
  - RougeLsum: 28.084
28
  - Gen Len: 19.000
29
-
30
- ## Usage
31
-
32
- You can use cURL to access this model:
33
-
34
- ```
35
- $ curl -X POST -H "Authorization: Bearer YOUR_HUGGINGFACE_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.co/sagard21/autotrain-code-explainer-2745581349
36
- ```
 
3
  - autotrain
4
  - summarization
5
  language:
6
+ - en
7
  widget:
8
+ - text: I love AutoTrain 🤗
9
  datasets:
10
  - sagard21/autotrain-data-code-explainer
11
  co2_eq_emissions:
12
  emissions: 5.393079045128973
13
+ license: mit
14
+ pipeline_tag: summarization
15
  ---
16
 
17
  # Model Trained Using AutoTrain
 
20
  - Model ID: 2745581349
21
  - CO2 Emissions (in grams): 5.3931
22
 
23
+ # Model Description
24
+
25
+ This model is an attempt to simplify code understanding by generating line by line explanation of a source code. This model was fine-tuned using the Salesforce/codet5-large model. Currently it is trained on a small subset of Python snippets.
26
+
27
+ # Model Usage
28
+
29
+ ```py
30
+ from transformers import AutoTokenizer, T5ForConditionalGeneration, SummarizationPipeline
31
+ import torch
32
+
33
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
34
+
35
+ pipeline = SummarizationPipeline(
36
+ model=T5ForConditionalGeneration.from_pretrained("sagard21/python-code-explainer"),
37
+ tokenizer=AutoTokenizer.from_pretrained("sagard21/python-code-explainer", skip_special_tokens=True),
38
+ device=device
39
+ )
40
+
41
+ raw_code = """
42
+ def preprocess(text: str) -> str:
43
+ text = str(text)
44
+ text = text.replace("\n", " ")
45
+ tokenized_text = text.split(" ")
46
+ preprocessed_text = " ".join([token for token in tokenized_text if token])
47
+
48
+ return preprocessed_text
49
+ """
50
+ pipeline([raw_code])
51
+
52
+ ```
53
+
54
+ ### Expected JSON Output
55
+
56
+ ```
57
+ [
58
+ {
59
+ "summary_text": "Create a function preprocess that will take the text as an argument and return the preprocessed text.\n1. In this case, the text will be converted to a string.\n2. At first, we will replace all \"\\n\" with \" \" and then split the text by \" \".\n3. Then we will call the tokenize function on the text and tokenize the text using the split() method.\n4. Next step is to create a list of all the tokens in the string and join them together.\n5. Then the function will return the string preprocessed_text.\n"
60
+ }
61
+ ]
62
+ ```
63
+
64
  ## Validation Metrics
65
 
66
  - Loss: 2.156
 
69
  - RougeL: 25.445
70
  - RougeLsum: 28.084
71
  - Gen Len: 19.000