🔥 Gemma-2B-Hinglish-LORA-v1.0 model

🚀 Visit this HF Space to try out this model's inference: https://huggingface.co/spaces/kirankunapuli/Gemma-2B-Hinglish-Model-Inference-v1.0

  • Developed by: Kiran Kunapuli
  • License: apache-2.0
  • Finetuned from model : unsloth/gemma-2b-bnb-4bit
  • Model usage: Use the below code in Python
      import re
      import torch
      from transformers import AutoTokenizer, AutoModelForCausalLM
      
      tokenizer = AutoTokenizer.from_pretrained("kirankunapuli/Gemma-2B-Hinglish-LORA-v1.0")
      model = AutoModelForCausalLM.from_pretrained("kirankunapuli/Gemma-2B-Hinglish-LORA-v1.0")
    
      device = "cuda:0" if torch.cuda.is_available() else "cpu"
      model = model.to(device)
    
      alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
    
      ### Instruction:
      {}
      
      ### Input:
      {}
      
      ### Response:
      {}"""
    
      # Example 1
      inputs = tokenizer(
      [
          alpaca_prompt.format(
              "Please answer the following sentence as requested", # instruction
              "ऐतिहासिक स्मारक India Gate कहाँ स्थित है?", # input
              "", # output - leave this blank for generation!
          )
      ], return_tensors = "pt").to(device)
      
      outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
      output = tokenizer.batch_decode(outputs)[0]
      response_start = output.find("### Response:") + len("### Response:")
      response_end = output.find("<eos>", response_start)
      response = output[response_start:response_end].strip()
      print(response)
      
      # Example 2
      inputs = tokenizer(
      [
          alpaca_prompt.format(
              "Please answer the following sentence as requested", # instruction
              "ऐतिहासिक स्मारक इंडिया गेट कहाँ स्थित है? मुझे अंग्रेजी में बताओ", # input
              "", # output - leave this blank for generation!
          )
      ], return_tensors = "pt").to(device)
      
      outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
      output = tokenizer.batch_decode(outputs)[0]
      response_pattern = re.compile(r'### Response:\n(.*?)<eos>', re.DOTALL)
      response_match = response_pattern.search(output)
    
      if response_match:
          response = response_match.group(1).strip()
          return response
      else:
          return "Response not found"
    
  • Model config:
      model = FastLanguageModel.get_peft_model(
      model,
      r = 16, 
      target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                        "gate_proj", "up_proj", "down_proj",],
      lora_alpha = 32,
      lora_dropout = 0, 
      bias = "none",   
      use_gradient_checkpointing = True, 
      random_state = 42,
      use_rslora = True,  
      loftq_config = None, 
      )
    
  • Training parameters:
      trainer = SFTTrainer(
      model = model,
      tokenizer = tokenizer,
      train_dataset = dataset,
      dataset_text_field = "text",
      max_seq_length = max_seq_length,
      dataset_num_proc = 2,
      packing = True,
      args = TrainingArguments(
          per_device_train_batch_size = 2,
          gradient_accumulation_steps = 4,
          warmup_steps = 5,
          max_steps = 120,
          learning_rate = 2e-4,
          fp16 = not torch.cuda.is_bf16_supported(),
          bf16 = torch.cuda.is_bf16_supported(),
          logging_steps = 1,
          optim = "adamw_8bit",
          weight_decay = 0.01,
          lr_scheduler_type = "linear",
          seed = 42,
          output_dir = "outputs",
          report_to = "wandb",
        ),
      )
    
  • Training details:
    ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
       \\   /|    Num examples = 14,343 | Num Epochs = 1
    O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
    \        /    Total batch size = 8 | Total steps = 120
     "-____-"     Number of trainable parameters = 19,611,648
    
    GPU = Tesla T4. Max memory = 14.748 GB.
    2118.7553 seconds used for training.
    35.31 minutes used for training.
    Peak reserved memory = 9.172 GB.
    Peak reserved memory for training = 6.758 GB.
    Peak reserved memory % of max memory = 62.191 %.
    Peak reserved memory for training % of max memory = 45.823 %.
    

This gemma model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
117
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for kirankunapuli/Gemma-2B-Hinglish-LORA-v1.0

Finetuned
(118)
this model
Quantizations
1 model

Datasets used to train kirankunapuli/Gemma-2B-Hinglish-LORA-v1.0

Space using kirankunapuli/Gemma-2B-Hinglish-LORA-v1.0 1