Cognitive-Lab
/

LLama3-Gaja-Hindi-8B-v0.1

@@ -1,199 +1,164 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+tags:
+- hindi
+- bilingual
+license: llama2
+datasets:
+- sarvamai/samvaad-hi-v1
+language:
+- hi
+- en
 ---
+# LLama3-Gaja-Hindi-8B-v0.1
+## Overview
+LLama3-Gaja-Hindi-8B-v0.1 is an extension of the Ambari series, a bilingual English/Hindi model developed and released by [Cognitivelab.in](https://www.cognitivelab.in/). This model is specialized for natural language understanding tasks, particularly in the context of instructional pairs. It is built upon the [Llama3 8b](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model, utilizing a fine-tuning process with a curated dataset of translated instructional pairs.
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6442d975ad54813badc1ddf7/G0u9L6RQJFinST0chQmfL.jpeg" width="500px">
+## Generate
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer
+model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
+tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)
+# Existing messages list
+messages = [
+    {"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
+    {"role": "user", "content": "Who are you"}
+]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    # tokenize=False,
+    return_tensors="pt"
+).to("cuda")
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=256,
+    eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
+    do_sample=True,
+    temperature=0.6,
+    top_p=0.9,
+)
+response = outputs[0][input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
+## Multi-turn Chat
+To use the Ambari-7B-Instruct-v0.1 model, you can follow the example code below:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer
+model = AutoModelForCausalLM.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", torch_dtype=torch.bfloat16).to("cuda")
+tokenizer = AutoTokenizer.from_pretrained("Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1", trust_remote_code=True)
+# Existing messages list
+messages = [
+    {"role": "system", "content": " You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
+]
+# Function to add user input and generate response
+def process_user_input(user_input):
+    global messages
+    # Add user's input to messages list
+    messages.append({"role": "user", "content": user_input})
+    # Prepare the prompt for generation
+    prompt_formatted_message = tokenizer.apply_chat_template(
+        messages,
+        add_generation_prompt=True,
+        tokenize=False
+    )
+    # Configure generation parameters
+    generation_config = GenerationConfig(
+        repetition_penalty=1.2,
+        max_new_tokens=8000,
+        temperature=0.2,
+        top_p=0.95,
+        top_k=40,
+        bos_token_id=tokenizer.bos_token_id,
+        eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
+        pad_token_id=tokenizer.pad_token_id,
+        do_sample=True,
+        use_cache=True,
+        return_dict_in_generate=True,
+        output_attentions=False,
+        output_hidden_states=False,
+        output_scores=False,
+    )
+    streamer = TextStreamer(tokenizer)
+    batch = tokenizer(str(prompt_formatted_message.strip()), return_tensors="pt")
+    print("\033[32mResponse: \033[0m")  # Print an empty response
+    # Generate response
+    generated = model.generate(
+        inputs=batch["input_ids"].to("cuda"),
+        generation_config=generation_config,
+        streamer=streamer,
+    )
+    # Extract and format assistant's response
+    # print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
+    assistant_response = tokenizer.decode(generated["sequences"].cpu().tolist()[0])
+     # Find the last occurrence of "assistant" and empty string ("")
+    assistant_start_index = assistant_response.rfind("<|start_header_id|>assistant<|end_header_id|>")
+    empty_string_index = assistant_response.rfind("<|eot_id|>")
+    # Extract the text between the last "assistant" and ""
+    if assistant_start_index != -1 and empty_string_index != -1:
+        final_response = assistant_response[assistant_start_index + len("<|start_header_id|>assistant<|end_header_id|>") : empty_string_index]
+    else:
+        # final_response = assistant_response  # If indices not found, use the whole response
+        assert "Filed to generate multi turn prompt formate"
+    # Append the extracted response to the messages list
+    messages.append({"role": "assistant", "content": final_response})
+    # messages.append({"role": "assistant", "content": assistant_response})
+    # Print assistant's response
+    # print(f"Assistant: {assistant_response}")
+# Main interaction loop
+while True:
+    print("=================================================================================")
+    user_input = input("Input: ")  # Prompt user for input
+    # Check if user_input is empty
+    if not user_input.strip():  # .strip() removes any leading or trailing whitespace
+        break  # Break out of the loop if input is empty
+      # Print response placeholder
+    process_user_input(user_input)  # Process user's input and generate response
+```
+## Prompt formate
+system prompt = `You are Gaja, an AI assistant created by Cognitivelab and trained on top of Llama 3 Large language model(LLM), proficient in English and Hindi. You can respond in both languages based on the users request.`
+## Benchmarks
+coming soon
+## Bilingual Instruct Fine-tuning
+The model underwent a pivotal stage of supervised fine-tuning with low-rank adaptation, focusing on bilingual instruct fine-tuning. This approach involved training the model to respond adeptly in either English or Hindi based on the language specified in the user prompt or instruction.
+## References
+- [Ambari-7B-Instruct Model](https://huggingface.co/Cognitive-Lab/Ambari-7B-Instruct-v0.1)