parinzee commited on
Commit
2dd284e
·
verified ·
1 Parent(s): 53fe1c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -10
README.md CHANGED
@@ -12,16 +12,15 @@ license: llama3
12
 
13
  # **Typhoon-Vision Research Preview**
14
 
15
- This is the research preview of Typhoon Vision.
16
 
17
- Typhoon Vision is family of Vision Language Models (VLM) specificially built for the 🇹🇭 Thai Language and Thai culture.
18
 
 
19
  Here we provide **Llama3 Typhoon Instruct Vision Preview** which is built upon [Llama-3-Typhoon-1.5-8B-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct) and [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
20
 
21
  We base off our architecture from [Bunny by BAAI](https://github.com/BAAI-DCAI/Bunny).
22
 
23
- # **Model Description**
24
-
25
  - **Model type**: A 8B instruct decoder-only model with vision encoder based on Llama architecture.
26
  - **Requirement**: transformers 4.38.0 or newer.
27
  - **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
@@ -37,9 +36,6 @@ Before running the snippet, you need to install the following dependencies:
37
  pip install torch transformers accelerate pillow
38
  ```
39
 
40
- If the CUDA memory is enough, it would be faster to execute this snippet by setting `CUDA_VISIBLE_DEVICES=0`.
41
-
42
-
43
  ```python
44
  import torch
45
  import transformers
@@ -54,11 +50,11 @@ transformers.logging.set_verbosity_error()
54
  transformers.logging.disable_progress_bar()
55
  warnings.filterwarnings('ignore')
56
 
57
- # set device
58
  device = 'cuda' # or cpu
59
  torch.set_default_device(device)
60
 
61
- # create model
62
  model = AutoModelForCausalLM.from_pretrained(
63
  'scb10x/llama-3-typhoon-v1.5-8b-instruct-vision-preview',
64
  torch_dtype=torch.float16, # float32 for cpu
@@ -94,13 +90,14 @@ def prepare_inputs(text, has_image=False, device='cuda'):
94
 
95
  return input_ids, attention_mask
96
 
 
97
  prompt = 'บอกทุกอย่างที่เห็นในรูป'
98
  img_url = "https://img.traveltriangle.com/blog/wp-content/uploads/2020/01/cover-for-Thailand-In-May_27th-Jan.jpg"
99
  image = Image.open(io.BytesIO(requests.get(img_url).content))
100
  image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
101
  input_ids, attention_mask = prepare_inputs(prompt, has_image=True, device=device)
102
 
103
- # generate
104
  output_ids = model.generate(
105
  input_ids,
106
  images=image_tensor,
 
12
 
13
  # **Typhoon-Vision Research Preview**
14
 
15
+ **llama-3-typhoon-v1.5-8b-vision-preview** is a 🇹🇭 Thai *vision-language* model. It supports both text and image input modalities natively while the output is text. This version (August 2024) is our first vision-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
16
 
17
+ More details can be found in our [release blog](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
18
 
19
+ # **Model Description**
20
  Here we provide **Llama3 Typhoon Instruct Vision Preview** which is built upon [Llama-3-Typhoon-1.5-8B-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct) and [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
21
 
22
  We base off our architecture from [Bunny by BAAI](https://github.com/BAAI-DCAI/Bunny).
23
 
 
 
24
  - **Model type**: A 8B instruct decoder-only model with vision encoder based on Llama architecture.
25
  - **Requirement**: transformers 4.38.0 or newer.
26
  - **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
 
36
  pip install torch transformers accelerate pillow
37
  ```
38
 
 
 
 
39
  ```python
40
  import torch
41
  import transformers
 
50
  transformers.logging.disable_progress_bar()
51
  warnings.filterwarnings('ignore')
52
 
53
+ # Set Device
54
  device = 'cuda' # or cpu
55
  torch.set_default_device(device)
56
 
57
+ # Create Model
58
  model = AutoModelForCausalLM.from_pretrained(
59
  'scb10x/llama-3-typhoon-v1.5-8b-instruct-vision-preview',
60
  torch_dtype=torch.float16, # float32 for cpu
 
90
 
91
  return input_ids, attention_mask
92
 
93
+ # Example Inputs (try replacing with your own url)
94
  prompt = 'บอกทุกอย่างที่เห็นในรูป'
95
  img_url = "https://img.traveltriangle.com/blog/wp-content/uploads/2020/01/cover-for-Thailand-In-May_27th-Jan.jpg"
96
  image = Image.open(io.BytesIO(requests.get(img_url).content))
97
  image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
98
  input_ids, attention_mask = prepare_inputs(prompt, has_image=True, device=device)
99
 
100
+ # Generate
101
  output_ids = model.generate(
102
  input_ids,
103
  images=image_tensor,