File size: 3,739 Bytes

e4350be
d57bab1
 
2910be8
a059f57
2910be8
e4350be
 
 
170bc6b
18ba7ce
e4350be
 
 
 
 
6053454
f8e2814
e4350be
d57bab1
18ba7ce
6053454
d57bab1
e4350be
 
 
 
 
 
 
 
 
18ba7ce
 
e4350be
 
 
 
 
 
 
 
 
 
18ba7ce
 
e4350be
 
 
6053454
d57bab1
43c5650
d57bab1
 
 
 
52dc519
a059f57
52dc519
a059f57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7563e6e
a059f57
 
 
 
 
 
 
 
43c5650
e4350be
 
 
 
18ba7ce
 
d57bab1
e4350be
 
 
 
 
 
6053454
e4350be
 
 
 
 
 
 
 
 
 
 
18ba7ce
43c5650

---
language:
- en
library_name: transformers
license: llama3.1
pipeline_tag: text-generation
---
<!-- Provide a quick summary of what the model is/does. -->

Finetuned Llama 3.1 Instruct model with knowledge distillation  
specifically for expertise on AMD technologies and python coding.

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been  
pushed on the Hub.

- **Developed by:** David Silverstein
- **Language(s) (NLP):** English, Python
- **License:** Free to use under Llama 3.1 licensing terms without warranty
- **Finetuned from model meta-llama/Meta-Llama-3.1-8B-Instruct**

### Model Sources [optional]
<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Can be used as a development assistant when using AMD technologies and python  
in on-premise environments.

## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and  
limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model:

~~~
import torch  
from transformers import AutoTokenizer, AutoModelForCausalLM  

model_name = 'davidsi/Llama3_1-8B-Instruct-AMD-python'  
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)

messages = [
    {"role": "system", "content": "You are a helpful assistant for AMD technologies and python."},
    {"role": "user", "content": query}
]

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    eos_token_id=terminators,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
~~~


## Training Details

Torchtune was used for full finetuning, for 5 epochs on a single Instinct MI210 GPU.  
The training set consisted of 1658 question/answer pairs in Alpaca format.

### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

#### Training Hyperparameters
- **Training regime:** [bf16 non-mixed precision] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data
<!-- This should link to a Dataset Card if possible. -->

### Model Architecture and Objective

This model is a finetuned version of Llama 3.1, which is an auto-regressive language  
model that uses an optimized transformer architecture.