Update README.md
Browse files
README.md
CHANGED
@@ -36,20 +36,28 @@ The model is developed to process diverse inputs, including images and text, fac
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
-
This version of Cephalo, ```lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k```, is based on the ```meta-llama/Llama-3.2-11B-Vision-Instruct``` model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers.
|
|
|
|
|
40 |
|
41 |
### Chat Format
|
42 |
|
43 |
The ```lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k``` is suiteable for one or more image inputs, wih prompts using the chat format as follows:
|
44 |
|
45 |
```raw
|
46 |
-
[{'role': 'user',
|
47 |
'content': [{'type': 'image'},
|
48 |
{'type': 'text',
|
49 |
-
'text': '
|
50 |
-
|
51 |
-
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
```
|
54 |
|
55 |
### Sample inference code
|
@@ -81,8 +89,8 @@ from PIL import Image
|
|
81 |
|
82 |
url = "https://huggingface.co/lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k/resolve/main/architected_stress_strain.png"
|
83 |
|
84 |
-
image = Image.open(requests.get(url, stream=True).raw)
|
85 |
-
|
86 |
messages = [
|
87 |
{"role": "user", "content": [
|
88 |
{"type": "image"},
|
@@ -90,7 +98,7 @@ messages = [
|
|
90 |
]}
|
91 |
]
|
92 |
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
|
93 |
-
inputs = processor(
|
94 |
|
95 |
output = model.generate(**inputs, max_new_tokens=512)
|
96 |
print(processor.decode(output[0]))
|
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
+
This version of Cephalo, ```lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k```, is based on the ```meta-llama/Llama-3.2-11B-Vision-Instruct``` model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers.
|
40 |
+
|
41 |
+
For further details on the base model, see: https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
|
42 |
|
43 |
### Chat Format
|
44 |
|
45 |
The ```lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k``` is suiteable for one or more image inputs, wih prompts using the chat format as follows:
|
46 |
|
47 |
```raw
|
48 |
+
messages=[{'role': 'user',
|
49 |
'content': [{'type': 'image'},
|
50 |
{'type': 'text',
|
51 |
+
'text': 'Consider the stress-strain response under compression. What are the three curves shown. Based on an inspection of the plot, do they show good agreement or are they very different?'}]}]
|
52 |
+
```
|
53 |
+
After application of the chat template:
|
54 |
+
|
55 |
+
```python
|
56 |
+
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
|
57 |
+
```
|
58 |
+
The raw input text is:
|
59 |
+
```raw
|
60 |
+
<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n<|image|>Consider the stress-strain response under compression. What are the three curves shown. Based on an inspection of the plot, do they show good agreement or are they very different?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
|
61 |
```
|
62 |
|
63 |
### Sample inference code
|
|
|
89 |
|
90 |
url = "https://huggingface.co/lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k/resolve/main/architected_stress_strain.png"
|
91 |
|
92 |
+
image = Image.open(requests.get(url, stream=True ).raw)
|
93 |
+
images = [image]
|
94 |
messages = [
|
95 |
{"role": "user", "content": [
|
96 |
{"type": "image"},
|
|
|
98 |
]}
|
99 |
]
|
100 |
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
|
101 |
+
inputs = processor(images, input_text, return_tensors="pt").to(model.device)
|
102 |
|
103 |
output = model.generate(**inputs, max_new_tokens=512)
|
104 |
print(processor.decode(output[0]))
|