Update README.md
Browse files
README.md
CHANGED
@@ -12,16 +12,15 @@ license: llama3
|
|
12 |
|
13 |
# **Typhoon-Vision Research Preview**
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
|
|
|
19 |
Here we provide **Llama3 Typhoon Instruct Vision Preview** which is built upon [Llama-3-Typhoon-1.5-8B-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct) and [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
|
20 |
|
21 |
We base off our architecture from [Bunny by BAAI](https://github.com/BAAI-DCAI/Bunny).
|
22 |
|
23 |
-
# **Model Description**
|
24 |
-
|
25 |
- **Model type**: A 8B instruct decoder-only model with vision encoder based on Llama architecture.
|
26 |
- **Requirement**: transformers 4.38.0 or newer.
|
27 |
- **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
|
@@ -37,9 +36,6 @@ Before running the snippet, you need to install the following dependencies:
|
|
37 |
pip install torch transformers accelerate pillow
|
38 |
```
|
39 |
|
40 |
-
If the CUDA memory is enough, it would be faster to execute this snippet by setting `CUDA_VISIBLE_DEVICES=0`.
|
41 |
-
|
42 |
-
|
43 |
```python
|
44 |
import torch
|
45 |
import transformers
|
@@ -54,11 +50,11 @@ transformers.logging.set_verbosity_error()
|
|
54 |
transformers.logging.disable_progress_bar()
|
55 |
warnings.filterwarnings('ignore')
|
56 |
|
57 |
-
#
|
58 |
device = 'cuda' # or cpu
|
59 |
torch.set_default_device(device)
|
60 |
|
61 |
-
#
|
62 |
model = AutoModelForCausalLM.from_pretrained(
|
63 |
'scb10x/llama-3-typhoon-v1.5-8b-instruct-vision-preview',
|
64 |
torch_dtype=torch.float16, # float32 for cpu
|
@@ -94,13 +90,14 @@ def prepare_inputs(text, has_image=False, device='cuda'):
|
|
94 |
|
95 |
return input_ids, attention_mask
|
96 |
|
|
|
97 |
prompt = 'บอกทุกอย่างที่เห็นในรูป'
|
98 |
img_url = "https://img.traveltriangle.com/blog/wp-content/uploads/2020/01/cover-for-Thailand-In-May_27th-Jan.jpg"
|
99 |
image = Image.open(io.BytesIO(requests.get(img_url).content))
|
100 |
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
|
101 |
input_ids, attention_mask = prepare_inputs(prompt, has_image=True, device=device)
|
102 |
|
103 |
-
#
|
104 |
output_ids = model.generate(
|
105 |
input_ids,
|
106 |
images=image_tensor,
|
|
|
12 |
|
13 |
# **Typhoon-Vision Research Preview**
|
14 |
|
15 |
+
**llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
|
16 |
|
17 |
+
More details can be found in our [release blog](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
|
18 |
|
19 |
+
# **Model Description**
|
20 |
Here we provide **Llama3 Typhoon Instruct Vision Preview** which is built upon [Llama-3-Typhoon-1.5-8B-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct) and [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
|
21 |
|
22 |
We base off our architecture from [Bunny by BAAI](https://github.com/BAAI-DCAI/Bunny).
|
23 |
|
|
|
|
|
24 |
- **Model type**: A 8B instruct decoder-only model with vision encoder based on Llama architecture.
|
25 |
- **Requirement**: transformers 4.38.0 or newer.
|
26 |
- **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
|
|
|
36 |
pip install torch transformers accelerate pillow
|
37 |
```
|
38 |
|
|
|
|
|
|
|
39 |
```python
|
40 |
import torch
|
41 |
import transformers
|
|
|
50 |
transformers.logging.disable_progress_bar()
|
51 |
warnings.filterwarnings('ignore')
|
52 |
|
53 |
+
# Set Device
|
54 |
device = 'cuda' # or cpu
|
55 |
torch.set_default_device(device)
|
56 |
|
57 |
+
# Create Model
|
58 |
model = AutoModelForCausalLM.from_pretrained(
|
59 |
'scb10x/llama-3-typhoon-v1.5-8b-instruct-vision-preview',
|
60 |
torch_dtype=torch.float16, # float32 for cpu
|
|
|
90 |
|
91 |
return input_ids, attention_mask
|
92 |
|
93 |
+
# Example Inputs (try replacing with your own url)
|
94 |
prompt = 'บอกทุกอย่างที่เห็นในรูป'
|
95 |
img_url = "https://img.traveltriangle.com/blog/wp-content/uploads/2020/01/cover-for-Thailand-In-May_27th-Jan.jpg"
|
96 |
image = Image.open(io.BytesIO(requests.get(img_url).content))
|
97 |
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
|
98 |
input_ids, attention_mask = prepare_inputs(prompt, has_image=True, device=device)
|
99 |
|
100 |
+
# Generate
|
101 |
output_ids = model.generate(
|
102 |
input_ids,
|
103 |
images=image_tensor,
|