Joetib commited on
Commit
213bc3e
·
1 Parent(s): 34e3826

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # ibleducation/ibl-neural-edu-content-7B
3
+ ibleducation/ibl-neural-edu-content-7B is a model finetuned on top of mistralai/Mistral-7B-v0.1
4
+
5
+
6
+ The model is finetuned to generate appropriate subtitles that can further be used with video generators to create tutorial videos.
7
+ The content of generated response includes full timestamps as well as content. The content provides a full tutorial to a topic passed as input.
8
+
9
+
10
+ ## Example Conversations
11
+ 1. Question: Information Theory \
12
+ Answer:
13
+ ```code
14
+ WEBVTT
15
+ Kind: captions
16
+ Language: en
17
+
18
+ 00:00:00.320 --> 00:00:01.153
19
+ - [Instructor] In this video,
20
+
21
+ 00:00:01.153 --> 00:00:03.270
22
+ we're going to talk about information.
23
+
24
+ 00:00:03.270 --> 00:00:05.100
25
+ And I know what you're thinking.
26
+
27
+ 00:00:05.100 --> 00:00:07.030
28
+ You're thinking, I know what information is.
29
+
30
+ 00:00:07.030 --> 00:00:08.860
31
+ I read the newspaper every day.
32
+
33
+ 00:00:08.860 --> 00:00:10.860
34
+ I watch TV shows.
35
+
36
+ 00:00:10.860 --> 00:00:12.690
37
+ I talk to my friends.
38
+
39
+ 00:00:12.690 --> 00:00:14.520
40
+ I know what information is.
41
+
42
+ 00:00:14.520 --> 00:00:16.450
43
+ But what we're going to
44
+ talk about in this video
45
+
46
+ 00:00:16.450 --> 00:00:18.280
47
+ is a very specific definition
48
+
49
+ 00:00:18.280 --> 00:00:20.150
50
+ of what information is.
51
+
52
+ 00:00:20.150 --> 00:00:22.150
53
+ And it's a very mathematical definition.
54
+
55
+ 00:00:22.150 --> 00:00:24.150
56
+ And it's a very specific definition
57
+ [.... content shortened for brevity ...]
58
+ ```
59
+
60
+
61
+
62
+ ## Model Details
63
+
64
+ - **Developed by:** [IBL Education](https://ibl.ai)
65
+ - **Model type:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
66
+ - **Base Model:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
67
+ - **Language:** English
68
+ - **Finetuned from weights:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
69
+ - **Finetuned on data:**
70
+ - [ibleducation/ibl-khanacademy-transcripts](https://huggingface.co/datasets/ibleducation/ibl-khanacademy-transcripts)
71
+ - **Model License:** Apache 2.0
72
+
73
+ ## How to Get Started with the Model
74
+
75
+ ### Install the necessary packages
76
+
77
+ Requires: [transformers](https://pypi.org/project/transformers/) > 4.35.0
78
+ ```shell
79
+ pip install transformers
80
+ pip install accelerate
81
+ ```
82
+ ### You can then try the following example code
83
+
84
+ ```python
85
+ from transformers import AutoModelForCausalLM, AutoTokenizer
86
+ import transformers
87
+ import torch
88
+
89
+ model_id = "ibleducation/ibl-neural-edu-content-7B"
90
+
91
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
92
+ model = AutoModelForCausalLM.from_pretrained(
93
+ model_id,
94
+ use_flash_attention_2=True,
95
+ torch_dtype=torch.bfloat16,
96
+ device_map="auto",
97
+ trust_remote_code=True
98
+ )
99
+ pipeline = transformers.pipeline(
100
+ "text-generation",
101
+ model=model,
102
+ tokenizer=tokenizer,
103
+ )
104
+ prompt = "<s>[INST]Information Theory[/INST] "
105
+
106
+ response = pipeline(prompt)
107
+ print(response['generated_text'])
108
+ ```
109
+
110
+ > In cases where the runtime gpu does not support flash attention, `use_flash_attention_2` can be ignored
111
+ > though at a possible performance cost
112
+
113
+ **Important** - Use the prompt template below for ibl-tutoring-7B-128k :
114
+ ```
115
+ <s>[INST]{prompt}[/INST]
116
+ ```
117
+