File size: 4,288 Bytes
a9979f5 9f5e4f0 1930b02 bc528bd b9434c2 8953341 4d6cc9b bc528bd eab57b7 4d6cc9b eab57b7 4b4843c ac48d25 4b4843c 5d5ca4b 748090d d17eb4a 748090d 4d6cc9b ac48d25 bc528bd ac48d25 1577cef ac48d25 b081453 9090d38 b081453 9090d38 b081453 9090d38 b081453 ac48d25 9090d38 ac48d25 9090d38 ac48d25 9090d38 ac48d25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
---
license: wtfpl
datasets:
- cakiki/rosetta-code
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- code
- programming-language
- code-classification
base_model: huggingface/CodeBERTa-small-v1
---
This Model is a fine-tuned version of *huggingface/CodeBERTa-small-v1* on *cakiki/rosetta-code* Dataset for 26 Programming Languages as mentioned below.
## Training Details:
Model is trained for 25 epochs on Azure for nearly 26000 Datapoints for above Mentioned 26 Programming Languages<br> extracted from Dataset having 1006 of total Programming Language.
### Programming Languages this model is able to detect vs Examples used for training
<ol>
<li>'ARM Assembly':</li>
<li>'AppleScript'</li>
<li>'C'</li>
<li>'C#'</li>
<li>'C++'</li>
<li>'COBOL'</li>
<li>'Erlang'</li>
<li>'Fortran'</li>
<li>'Go'</li>
<li>'Java'</li>
<li>'JavaScript'</li>
<li>'Kotlin'</li>
<li>'Lua</li>
<li>'Mathematica/Wolfram Language'</li>
<li>'PHP'</li>
<li>'Pascal'</li>
<li>'Perl'</li>
<li>'PowerShell'</li>
<li>'Python'</li>
<li>'R</li>
<li>'Ruby'</li>
<li>'Rust'</li>
<li>'Scala'</li>
<li>'Swift'</li>
<li>'Visual Basic .NET'</li>
<li>'jq'</li>
</ol>
<br>
## Below is the Training Result for 25 epochs.
<ul>
<li>Training Computer Configuration: <ul>
<li>GPU:1xNvidia Tesla T4, </li>
<li>VRam: 16GB,</li>
<li>Ram:112GB,</li>
<li>Cores:6 Cores </li>
</ul></li>
<li>Training Time taken: exactly 7 hours for 25 epochs</li>
<li>Training Hyper-parameters: </li>
</ul>
![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c859ad90782b1a6a3e957/YIYl1XZk0zpi3DCvn3D80.png)
![training detail.png](https://cdn-uploads.huggingface.co/production/uploads/645c859ad90782b1a6a3e957/Oi9TuJ8nEjtt6Z_W56myn.png)
## Inference Code
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
model_name = 'philomath-1209/programming-language-identification'
loaded_tokenizer = AutoTokenizer.from_pretrained(model_name)
loaded_model = AutoModelForSequenceClassification.from_pretrained(model_name)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
text = """
PROGRAM Triangle
IMPLICIT NONE
REAL :: a, b, c, Area
PRINT *, 'Welcome, please enter the&
&lengths of the 3 sides.'
READ *, a, b, c
PRINT *, 'Triangle''s area: ', Area(a,b,c)
END PROGRAM Triangle
FUNCTION Area(x,y,z)
IMPLICIT NONE
REAL :: Area ! function type
REAL, INTENT( IN ) :: x, y, z
REAL :: theta, height
theta = ACOS((x**2+y**2-z**2)/(2.0*x*y))
height = x*SIN(theta); Area = 0.5*y*height
END FUNCTION Area
"""
inputs = loaded_tokenizer(text, return_tensors="pt",truncation=True)
with torch.no_grad():
logits = loaded_model(**inputs).logits
predicted_class_id = logits.argmax().item()
loaded_model.config.id2label[predicted_class_id]
```
### Optimum with ONNX inference
Loading the model requires the 🤗 Optimum library installed.
```shell
pip install transformers optimum[onnxruntime] optimum
```
```python
model_path = "philomath-1209/programming-language-identification"
import torch
from transformers import pipeline, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder="onnx")
model = ORTModelForSequenceClassification.from_pretrained(model_path, export=False, subfolder="onnx")
text = """
PROGRAM Triangle
IMPLICIT NONE
REAL :: a, b, c, Area
PRINT *, 'Welcome, please enter the&
&lengths of the 3 sides.'
READ *, a, b, c
PRINT *, 'Triangle''s area: ', Area(a,b,c)
END PROGRAM Triangle
FUNCTION Area(x,y,z)
IMPLICIT NONE
REAL :: Area ! function type
REAL, INTENT( IN ) :: x, y, z
REAL :: theta, height
theta = ACOS((x**2+y**2-z**2)/(2.0*x*y))
height = x*SIN(theta); Area = 0.5*y*height
END FUNCTION Area
"""
inputs = tokenizer(text, return_tensors="pt",truncation=True)
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]
``` |