CRIA v1.3

πŸ’‘ Article | πŸ’» Github | πŸ“” Colab 1,2

What is CRIA?

krΔ“-Ι™ plural crias. : a baby llama, alpaca, vicuΓ±a, or guanaco.

Cria Logo
or what ChatGPT suggests, "Crafting a Rapid prototype of an Intelligent llm App using open source resources".

The initial objective of the CRIA project is to develop a comprehensive end-to-end chatbot system, starting from the instruction-tuning of a large language model and extending to its deployment on the web using frameworks such as Next.js.

Specifically, we have fine-tuned the llama-2-7b-chat-hf model with QLoRA (4-bit precision) using the mlabonne/CodeLlama-2-20k dataset. This fine-tuned model serves as the backbone for the CRIA chat platform.

πŸ“¦ Model Release

CRIA v1.3 comes with several variants.

This model is converted from the q4_0 GGML version of CRIA v1.3 using the llama.cpp's convert-llama-ggml-to-gguf.py script

πŸ”§ Training

It was trained on a Google Colab notebook with a T4 GPU and high RAM.

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: float16

Framework versions

  • PEFT 0.4.0

πŸ’» Usage

# pip install transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "davzoku/cria-llama2-7b-v1.3"
prompt = "What is a cria?"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    f'<s>[INST] {prompt} [/INST]',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

References

We'd like to thank:

  • mlabonne for his article and resources on implementation of instruction tuning
  • TheBloke for his script for LLM quantization.
Downloads last month
10
GGUF
Model size
6.74B params
Architecture
llama
Inference Examples
Inference API (serverless) has been turned off for this model.

Dataset used to train davzoku/cria-llama2-7b-v1.3-GGUF

Collection including davzoku/cria-llama2-7b-v1.3-GGUF