File size: 3,043 Bytes
31290e7
 
 
 
 
51a1dc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97d11ff
51a1dc6
 
31290e7
 
 
 
 
 
 
 
 
 
 
 
 
 
51a1dc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: apache-2.0
datasets:
- Open-Orca/SlimOrca
---
# Instruction-Tuned Mamba 2.8B on SlimOrca Dataset

## Overview
This repository features the [2.8 billion parameter Mamba model](https://huggingface.co/state-spaces/mamba-2.8b), fine-tuned on a subset (20k) of the [SlimOrca dataset](https://huggingface.co/datasets/Open-Orca/SlimOrca). Big thanks to Justin Mattern from Haven for contributing essential code in the [mamba-chat repository](https://github.com/havenhq/mamba-chat)


## Usage Instructions
To utilize the fine-tuned model, follow the Python code snippet below:

```python
import torch
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("Schmadge/mamba-slim-orca")
tokenizer.eos_token = tokenizer.pad_token = "<|endoftext|>"
tokenizer.chat_template = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta").chat_template
model = MambaLMHeadModel.from_pretrained("Schmadge/mamba-slim-orca", device=device, dtype=torch.float16)

def generate_response(system_prompt, user_prompt):
    # Preparing the prompt
    prompt = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    input_ids = tokenizer.apply_chat_template(prompt, return_tensors="pt", add_generation_prompt=True).to(device)

    # Generating the response
    out = model.generate(input_ids=input_ids, max_length=2000, temperature=0.3, top_p=0.7, eos_token_id=tokenizer.eos_token_id)
    decoded = tokenizer.batch_decode(out)

    return decoded[0].split("<|assistant|>\n")[-1].replace('<|endoftext|>','')

system_prompt = "You are an AI assistant. Provide a detailed answer so user don't need to search outside to understand the answer."
user_prompt = "In a room I have only 3 sisters. Anna is reading a book. Alice is playing a match of chess.What the third sister, Amanda is doing ?"
response = generate_response(system_prompt, user_prompt)
print(response)
#Based on the information provided, we can infer that Amanda is playing a match of chess with Alice. Since Anna is reading a book, it is reasonable to assume that Amanda is playing a game of chess with Alice, as this is a common activity for siblings to engage in together.
```

## Refrences:

Mamba Chat:
```bibtex
@misc{haven2023mambachat,
  title        = {Mamba-Chat},
  author       = {Justus Mattern and Konstantin Hohr},
  year         = {2023},
  howpublished = {GitHub},
  url          = {https://github.com/havenhq/mamba-chat}
}
```


Mamba:
```bibtex
@article{mamba,
  title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces},
  author={Gu, Albert and Dao, Tri},
  journal={arXiv preprint arXiv:2312.00752},
  year={2023}
}
```

SlimOrca:
```bibtex
@misc{SlimOrca,
  title = {SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces, with Verification},
  author = {Wing Lian and others},
  year = {2023},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Open-Orca/SlimOrca}
}
```