Introduction

GemSUra-edu is a large language model fine-tuned on a dataset of FAQs from HCMUT, based on the pre-trained model GemSUra 2B developed by the URA research group at Ho Chi Minh City University of Technology (HCMUT).

Inference (with Unsloth for higher speed)

from unsloth import FastLanguageModel
import torch

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="IAmSkyDra/GemSUra-edu",
    max_seq_length=4096,
    dtype=None,
    load_in_4bit=True
)

FastLanguageModel.for_inference(model)

query_template = "<start_of_turn>user\n{query}<end_of_turn>\n<start_of_turn>model\n"

while True:
    query = input("Query: ")
    if query.lower() == "exit":
        break

    query = query_template.format(query=query)
    inputs = tokenizer(query, return_tensors="pt")

    outputs = model.generate(**inputs, max_new_tokens=4096, use_cache=True)
    generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    answer = generated_text[0].split("model\n")[1].strip()
    print(answer)

Inference (with Transformers)

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

pipeline_kwargs = {
    "temperature": 0.1,
    "max_new_tokens": 4096,
    "do_sample": True
}

if __name__ == "__main__":
    # Load model
    model = AutoModelForCausalLM.from_pretrained(
        "IAmSkyDra/GemSUra-edu",
        device_map="auto"
    )
    model.eval()

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        "IAmSkyDra/GemSUra-edu",
        trust_remote_code=True
    )

    pipeline = transformers.pipeline(
        model=model,
        tokenizer=tokenizer,
        return_full_text=False,
        task='text-generation',
        **pipeline_kwargs
    )

    query_template = "<start_of_turn>user\n{query}<end_of_turn>\n<start_of_turn>model\n"

    while True:
        query = input("Query: ")
        if query.lower() == "exit":
            break

        query = query_template.format(query=query)
        answer = pipeline(query)[0]["generated_text"]
        answer = answer.split("model\n")[1].strip()
        print(answer)

Notation

If you want to quantize the model for deployment on local devices, it should be quantized to at least 8 bits.

Downloads last month
70
Safetensors
Model size
2.51B params
Tensor type
BF16
·
Inference Examples
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train IAmSkyDra/GemSUra-edu