aria-dev commited on
Commit
5d840d6
·
verified ·
1 Parent(s): 021f586

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -72
README.md CHANGED
@@ -1,73 +1,73 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- library_name: transformers
6
- pipeline_tag: image-text-to-text
7
- tags:
8
- - multimodal
9
- - aria
10
- ---
11
- <!-- <p align="center">
12
- <br>Aria</br>
13
- </p> -->
14
-
15
- This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The only modification is replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
16
-
17
- While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.
18
-
19
-
20
- ## Quick Start
21
- ### Installation
22
- ```
23
- pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow
24
- pip install flash-attn --no-build-isolation
25
- ```
26
-
27
- ### Inference
28
-
29
- ```python
30
- import requests
31
- import torch
32
- from PIL import Image
33
- from transformers import AutoModelForCausalLM, AutoProcessor
34
-
35
- model_id_or_path = "rhymes-ai/Aria-sequential_mlp"
36
-
37
- model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
38
-
39
- processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)
40
-
41
- image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
42
-
43
- image = Image.open(requests.get(image_path, stream=True).raw)
44
-
45
- messages = [
46
- {
47
- "role": "user",
48
- "content": [
49
- {"text": None, "type": "image"},
50
- {"text": "what is the image?", "type": "text"},
51
- ],
52
- }
53
- ]
54
-
55
- text = processor.apply_chat_template(messages, add_generation_prompt=True)
56
- inputs = processor(text=text, images=image, return_tensors="pt")
57
- inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
58
- inputs = {k: v.to(model.device) for k, v in inputs.items()}
59
-
60
- with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
61
- output = model.generate(
62
- **inputs,
63
- max_new_tokens=500,
64
- stop_strings=["<|im_end|>"],
65
- tokenizer=processor.tokenizer,
66
- do_sample=True,
67
- temperature=0.9,
68
- )
69
- output_ids = output[0][inputs["input_ids"].shape[1]:]
70
- result = processor.decode(output_ids, skip_special_tokens=True)
71
-
72
- print(result)
73
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - multimodal
9
+ - aria
10
+ ---
11
+ <!-- <p align="center">
12
+ <br>Aria</br>
13
+ </p> -->
14
+
15
+ This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The only modification is replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
16
+
17
+ While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster training speed.
18
+
19
+
20
+ ## Quick Start
21
+ ### Installation
22
+ ```
23
+ pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow
24
+ pip install flash-attn --no-build-isolation
25
+ ```
26
+
27
+ ### Inference
28
+
29
+ ```python
30
+ import requests
31
+ import torch
32
+ from PIL import Image
33
+ from transformers import AutoModelForCausalLM, AutoProcessor
34
+
35
+ model_id_or_path = "rhymes-ai/Aria-sequential_mlp"
36
+
37
+ model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
38
+
39
+ processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)
40
+
41
+ image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
42
+
43
+ image = Image.open(requests.get(image_path, stream=True).raw)
44
+
45
+ messages = [
46
+ {
47
+ "role": "user",
48
+ "content": [
49
+ {"text": None, "type": "image"},
50
+ {"text": "what is the image?", "type": "text"},
51
+ ],
52
+ }
53
+ ]
54
+
55
+ text = processor.apply_chat_template(messages, add_generation_prompt=True)
56
+ inputs = processor(text=text, images=image, return_tensors="pt")
57
+ inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
58
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
59
+
60
+ with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
61
+ output = model.generate(
62
+ **inputs,
63
+ max_new_tokens=500,
64
+ stop_strings=["<|im_end|>"],
65
+ tokenizer=processor.tokenizer,
66
+ do_sample=True,
67
+ temperature=0.9,
68
+ )
69
+ output_ids = output[0][inputs["input_ids"].shape[1]:]
70
+ result = processor.decode(output_ids, skip_special_tokens=True)
71
+
72
+ print(result)
73
  ```