Llama3_1-8B-Instruct-AMD-python / README.md

Update README.md

52dc519 verified 4 months ago

3.74 kB

	---
	language:
	- en
	library_name: transformers
	license: llama3.1
	pipeline_tag: text-generation
	---
	<!-- Provide a quick summary of what the model is/does. -->

	Finetuned Llama 3.1 Instruct model with knowledge distillation
	specifically for expertise on AMD technologies and python coding.

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been
	pushed on the Hub.

	- Developed by: David Silverstein
	- Language(s) (NLP): English, Python
	- License: Free to use under Llama 3.1 licensing terms without warranty
	- Finetuned from model meta-llama/Meta-Llama-3.1-8B-Instruct

	### Model Sources [optional]
	<!-- Provide the basic links for the model. -->

	- Repository: [More Information Needed]
	- Demo [optional]: [More Information Needed]

	## Uses
	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	Can be used as a development assistant when using AMD technologies and python
	in on-premise environments.

	## Bias, Risks, and Limitations
	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	[More Information Needed]

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and
	limitations of the model. More information needed for further recommendations.

	## How to Get Started with the Model

	Use the code below to get started with the model:

	~~~
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = 'davidsi/Llama3_1-8B-Instruct-AMD-python'
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)

	messages = [
	{"role": "system", "content": "You are a helpful assistant for AMD technologies and python."},
	{"role": "user", "content": query}
	]

	terminators = [
	tokenizer.eos_token_id,
	tokenizer.convert_tokens_to_ids("<\|eot_id\|>")
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(device)

	outputs = model.generate(
	input_ids,
	max_new_tokens=8192,
	eos_token_id=terminators,
	pad_token_id=tokenizer.eos_token_id,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	response = outputs[0][input_ids.shape[-1]:]
	print(tokenizer.decode(response, skip_special_tokens=True))
	~~~


	## Training Details

	Torchtune was used for full finetuning, for 5 epochs on a single Instinct MI210 GPU.
	The training set consisted of 1658 question/answer pairs in Alpaca format.

	### Training Data
	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[More Information Needed]

	#### Training Hyperparameters
	- Training regime: [bf16 non-mixed precision] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

	## Evaluation
	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data
	<!-- This should link to a Dataset Card if possible. -->

	### Model Architecture and Objective

	This model is a finetuned version of Llama 3.1, which is an auto-regressive language
	model that uses an optimized transformer architecture.