--- language: - en - vi license: llama2 datasets: - timdettmers/openassistant-guanaco model_name: LacDa2 7B inference: true model_creator: Will Nguyen model_link: https://huggingface.co/willnguyen/lacda-2-7B-chat-v0.1 model_type: llama base_model: meta-llama/llama-2-7b-hf --- # LacDa2 Model Card Readme ## Model Information **Model Name:** LacDa **Description:** LacDa is a specialized language model that has been fine-tuned from the LLama2 model. It is designed to provide advanced natural language processing capabilities in specific domains or applications. **Fine-tuned from:** LLama2 ## Instruction format ```python from transformers import AutoModelForCausalLM, LlamaTokenizer, BitsAndBytesConfig, TextStreamer, StoppingCriteria, StoppingCriteriaList import torch class StopTokenCriteria(StoppingCriteria): def __init__(self, stop_tokens, tokenizer, prompt_length): self.stop_tokens = stop_tokens if tokenizer.pad_token not in stop_tokens: self.stop_tokens.append(tokenizer.pad_token) if tokenizer.bos_token not in stop_tokens: self.stop_tokens.append(tokenizer.bos_token) if tokenizer.eos_token not in stop_tokens: self.stop_tokens.append(tokenizer.eos_token) self.tokenizer = tokenizer self.prompt_length = prompt_length def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: is_done = False tokens = tokenizer.decode(input_ids[0])[self.prompt_length:] for st in self.stop_tokens: if st in tokens: is_done = True break return is_done model_name = "willnguyen/lacda-2-7B-chat-v0.1" tokenizer = LlamaTokenizer.from_pretrained( model_name, use_fast=False, padding_side="right", tokenizer_type='llama', ) tokenizer.pad_token_id = 0 model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype=torch.float16, ) prompt = " [INST] who is Hồ Chí Minh [/INST]" stopping_criteria = StoppingCriteriaList([StopTokenCriteria(["[INST]", "[/INST]"], tokenizer, len(prompt))]) with torch.inference_mode(): input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to('cuda') streamer = TextStreamer(tokenizer) _ = model.generate( input_ids=input_ids, max_new_tokens=1024, do_sample=False, temperature=1.0, top_p=1.0, top_k=50, repetition_penalty=1.0, use_cache=True, streamer=streamer, stopping_criteria=stopping_criteria ) ```