iblai
/

ibl-neural-edu-content-7B

+# ibleducation/ibl-neural-edu-content-7B
+ibleducation/ibl-neural-edu-content-7B is a model finetuned on top of mistralai/Mistral-7B-v0.1
+The model is finetuned to generate appropriate subtitles that can further be used with video generators to create tutorial videos.
+The content of generated response includes full timestamps as well as content. The content provides a full tutorial to a topic passed as input.
+## Example Conversations
+1. Question:  Information Theory \
+   Answer:
+   ```code
+    WEBVTT
+    Kind: captions
+    Language: en
+    00:00:00.320 --> 00:00:01.153
+    - [Instructor] In this video,
+    00:00:01.153 --> 00:00:03.270
+    we're going to talk about information.
+    00:00:03.270 --> 00:00:05.100
+    And I know what you're thinking.
+    00:00:05.100 --> 00:00:07.030
+    You're thinking, I know what information is.
+    00:00:07.030 --> 00:00:08.860
+    I read the newspaper every day.
+    00:00:08.860 --> 00:00:10.860
+    I watch TV shows.
+    00:00:10.860 --> 00:00:12.690
+    I talk to my friends.
+    00:00:12.690 --> 00:00:14.520
+    I know what information is.
+    00:00:14.520 --> 00:00:16.450
+    But what we're going to
+    talk about in this video
+    00:00:16.450 --> 00:00:18.280
+    is a very specific definition
+    00:00:18.280 --> 00:00:20.150
+    of what information is.
+    00:00:20.150 --> 00:00:22.150
+    And it's a very mathematical definition.
+    00:00:22.150 --> 00:00:24.150
+    And it's a very specific definition
+   [.... content shortened for brevity ...]
+    ```
+## Model Details
+- **Developed by:** [IBL Education](https://ibl.ai)
+- **Model type:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- **Base Model:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- **Language:** English
+- **Finetuned from weights:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- **Finetuned on data:**
+  - [ibleducation/ibl-khanacademy-transcripts](https://huggingface.co/datasets/ibleducation/ibl-khanacademy-transcripts)
+- **Model License:** Apache 2.0
+## How to Get Started with the Model
+### Install the necessary packages
+Requires: [transformers](https://pypi.org/project/transformers/) > 4.35.0
+```shell
+pip install transformers
+pip install accelerate
+```
+### You can then try the following example code
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import transformers
+import torch
+model_id = "ibleducation/ibl-neural-edu-content-7B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+  model_id,
+  use_flash_attention_2=True,
+  torch_dtype=torch.bfloat16,
+  device_map="auto",
+  trust_remote_code=True
+)
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+)
+prompt = "<s>[INST]Information Theory[/INST] "
+response = pipeline(prompt)
+print(response['generated_text'])
+```
+> In cases where the runtime gpu does not support flash attention, `use_flash_attention_2` can be ignored
+> though at a possible performance cost
+**Important** - Use the prompt template below for ibl-tutoring-7B-128k :
+```
+<s>[INST]{prompt}[/INST]
+```