parinzee commited on
Commit
5ebe62e
·
verified ·
1 Parent(s): 8718345

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ language:
4
+ - th
5
+ - en
6
+ library_name: transformers
7
+ tags:
8
+ - instruct
9
+ - chat
10
+ license: llama3
11
+ ---
12
+
13
+ # Typhoon Vision Research Preview
14
+
15
+ This is the research preview of Typhoon Vision.
16
+
17
+ Typhoon Vision is family of Vision Language Models (VLM) specificially built for the 🇹🇭 Thai Language and Thai culture.
18
+
19
+ Here we provide **Llama3 Typhoon Instruct Vision Preview** which is built upon [Llama-3-Typhoon-1.5-8B-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct) and [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
20
+
21
+ # **Model Description**
22
+
23
+ - **Model type**: A 8B instruct decoder-only model with vision encoder based on Llama architecture.
24
+ - **Requirement**: transformers 4.38.0 or newer.
25
+ - **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
26
+ - **License**: [Llama 3 Community License](https://llama.meta.com/llama3/license/)
27
+
28
+ # Quickstart
29
+
30
+ Here we show a code snippet to show you how to use the model with transformers.
31
+
32
+ Before running the snippet, you need to install the following dependencies:
33
+
34
+ ```shell
35
+ pip install torch transformers accelerate pillow
36
+ ```
37
+
38
+ If the CUDA memory is enough, it would be faster to execute this snippet by setting `CUDA_VISIBLE_DEVICES=0`.
39
+
40
+
41
+ ```python
42
+ import torch
43
+ import transformers
44
+ from transformers import AutoModelForCausalLM, AutoTokenizer
45
+ from PIL import Image
46
+ import warnings
47
+ import io
48
+ import requests
49
+
50
+ # disable some warnings
51
+ transformers.logging.set_verbosity_error()
52
+ transformers.logging.disable_progress_bar()
53
+ warnings.filterwarnings('ignore')
54
+
55
+ # set device
56
+ device = 'cuda' # or cpu
57
+ torch.set_default_device(device)
58
+
59
+ # create model
60
+ model = AutoModelForCausalLM.from_pretrained(
61
+ 'scb10x/llama-3-typhoon-v1.5-8b-instruct-vision-preview',
62
+ torch_dtype=torch.float16, # float32 for cpu
63
+ device_map='auto',
64
+ trust_remote_code=True)
65
+ tokenizer = AutoTokenizer.from_pretrained(
66
+ 'scb10x/llama-3-typhoon-v1.5-8b-instruct-vision-preview',
67
+ trust_remote_code=True)
68
+
69
+ def prepare_inputs(text, has_image=False, device='cuda'):
70
+ messages = [
71
+ {"role": "system", "content": "You are a helpful vision-capable assistant who eagerly converses with the user in their language."},
72
+ ]
73
+
74
+ if has_image:
75
+ messages.append({"role": "user", "content": "<|image|>\n" + text})
76
+ else:
77
+ messages.append({"role": "user", "content": text})
78
+
79
+ inputs_formatted = tokenizer.apply_chat_template(
80
+ messages,
81
+ add_generation_prompt=True,
82
+ tokenize=False
83
+ )
84
+
85
+ text_chunks = [tokenizer(chunk).input_ids for chunk in inputs_formatted.split('<|image|>')]
86
+ input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1][1:], dtype=torch.long).unsqueeze(0).to(device)
87
+ attention_mask = torch.ones_like(input_ids).to(device)
88
+
89
+ return input_ids, attention_mask
90
+
91
+ prompt = 'บอกทุกอย่างที่เห็นในรูป'
92
+ img_url = "https://img.traveltriangle.com/blog/wp-content/uploads/2020/01/cover-for-Thailand-In-May_27th-Jan.jpg"
93
+ image = Image.open(io.BytesIO(requests.get(img_url).content))
94
+ image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
95
+ input_ids, attention_mask = prepare_inputs(prompt, has_image=True, device=device)
96
+
97
+ # generate
98
+ output_ids = model.generate(
99
+ input_ids,
100
+ images=image_tensor,
101
+ max_new_tokens=1000,
102
+ use_cache=True,
103
+ temperature=0.2,
104
+ top_p=0.2,
105
+ repetition_penalty=1.0 # increase this to avoid chattering,
106
+ )[0]
107
+
108
+ print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
109
+ ```