T-Llama / README.md
1TuanPham's picture
Update README.md
f96f4cb verified
|
raw
history blame
3.54 kB
metadata
license: apache-2.0
language:
  - vi
  - en

Image

Model Details

  • Developed by: Tuan Pham (FPTU HCM Student)
  • Model type: Llama2-7B Decoder-only
  • Finetuned from model :
    • meta-llama/Llama-2-7b
    • bkai-foundation-models/vietnamese-llama2-7b-120GB
    • yeen214/llama2_7b_merge_orcafamily.
  • Bilingual support : English and Vietnamese

Model Description

This model is a proof of effort that one man can fine-tune his own model to reach SOTA.

Model Sources

Uses

Prompt template

[SYSTEM_PROMPT]

 ####### Instruction:
[INPUT]

 %%%%%%% Response:
[RESPONSE]

How to Get Started with the Model

Use the code below to get started with the model.

from torch.cuda.amp import autocast
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, pipeline

model_name = "1TuanPham/T-Llama"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             torch_dtype=torch.bfloat16,
                                             use_cache=True,
                                             )
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
streamer = TextStreamer(tokenizer, skip_special_tokens=True)
pipe = pipeline("text-generation", model=base_model, tokenizer=tokenizer, streamer=streamer)

with autocast():
  output_default = pipe("Phạm Nhật Vượng là ", pad_token_id=50256, max_new_tokens=128)

Training Details

Hardware Type:

  • GPU: VGA NVIDIA Tesla P100 16GB
  • SYSTEM RAM: 29GB

Hours used: ~47.5 Approx*

Training Data

  • BactrianX
  • OpenOrca_translated
  • WizardLM_70k_translated
  • TigerLabMathInstruct_translated_vi
  • GradeSchoolMathInstruct_translated
  • vilm_lima-vi
  • MTEngVietnamese
  • databricks_dolly15k_translated
  • AlpacaCleaned_translated
  • databricks_dolly15k
  • OpenOrca
  • GradeSchoolMathInstruct
  • AlpacaCleaned
  • WebglmQA

Training Procedure

  • Learning rate: 2e-5 cosine

  • Optimizer: PagedLion8bit

  • QLora: rank: 64 /Q: 4-bit

    • 250k examples of 70% Vietnamese 30% English for 3.37 epoch
    • 350k examples of 60% Vietnamese 40% English for 1.4 epoch

Training loss

image/png

Evaluation

Results

[More Information Needed]

Technical Specifications

Model Architecture and Objective

[More Information Needed]

Citation

Model Card Authors

Model Card Contact

[More Information Needed]