davidsi/Llama3_1-8B-Instruct-AMD-python

Finetuned Llama 3.1 Instruct model with knowledge distillation
specifically for expertise on AMD technologies and python coding.

Model Description

This is the model card of a 🤗 transformers model that has been
pushed on the Hub.

Developed by: David Silverstein
Language(s) (NLP): English, Python
License: Free to use under Llama 3.1 licensing terms without warranty
Finetuned from model meta-llama/Meta-Llama-3.1-8B-Instruct

Model Sources [optional]

Repository: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Can be used as a development assistant when using AMD technologies and python
in on-premise environments.

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and
limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model:

import torch  
from transformers import AutoTokenizer, AutoModelForCausalLM  

model_name = 'davidsi/Llama3_1-8B-Instruct-AMD-python'  
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)

messages = [
    {"role": "system", "content": "You are a helpful assistant for AMD technologies and python."},
    {"role": "user", "content": query}
]

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    eos_token_id=terminators,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Training Details

Torchtune was used for full finetuning, for 5 epochs on a single Instinct MI210 GPU.
The training set consisted of 1658 question/answer pairs in Alpaca format.

Training Data

[More Information Needed]

Training Hyperparameters

Training regime: [bf16 non-mixed precision]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Model Architecture and Objective

This model is a finetuned version of Llama 3.1, which is an auto-regressive language
model that uses an optimized transformer architecture.

davidsi
/

Llama3_1-8B-Instruct-AMD-python