Uploaded model

  • Developed by: tykiww
  • License: apache-2.0
  • Finetuned from model : unsloth/llama-3-8b-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.


Setting up and testing own Endpoint Handler

Sources:

Setup Environment

Install necessary packages to set up and test endpoint handler.

# install git-lfs to interact with the repository
sudo apt-get update
sudo apt-get install git-lfs
# install transformers (not needed for inference since it is installed by default in the container)
pip install transformers[sklearn,sentencepiece,audio,vision]

Clone model weights of interest.

git lfs install
git clone https://huggingface.co/tykiww/llama3-8b-quantized

Login to huggingface

# setup cli with token
huggingface-cli login
git config --global credential.helper store

Confirm login in case you are unsure.

huggingface-cli whoami

Navigate to repo and create a handler.py file

cd llama3-8b-bnb-4bit-lora #&& touch handler.py

Create a requirements.txt file with the following items

huggingface_hub
unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git
xformers
trl<0.9.0 
peft==0.11.1
bitsandbytes
transformers==4.41.2 # must use /:

Must have a GPU compatible with Unsloth.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for tykiww/llama3-8b-quantized

Finetuned
(2630)
this model