Prarabdha commited on
Commit
c2164c1
·
verified ·
1 Parent(s): 2c4cffa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -1
README.md CHANGED
@@ -5,4 +5,100 @@ base_model:
5
  library_name: transformers
6
  tags:
7
  - text-generation-inference
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  library_name: transformers
6
  tags:
7
  - text-generation-inference
8
+ ---
9
+ # Pixtral-12B-2409 - HuggingFace Transformers Compatible Weights
10
+
11
+ ## Model Overview
12
+
13
+ This repository contains the HuggingFace Transformers compatible weights for the Pixtral-12B-2409 multimodal model. The weights have been converted to ensure seamless integration with the Hugging Face Transformers library, allowing easy loading and usage in your projects.
14
+
15
+ ## Model Details
16
+
17
+ - **Original Model**: Pixtral-12B-2409 by Mistral AI
18
+ - **Model Type**: Multimodal Language Model
19
+ - **Parameters**: 12B parameters + 400M parameter vision encoder
20
+ - **Sequence Length**: 128k tokens
21
+ - **License**: Apache 2.0
22
+
23
+ ## Key Features
24
+
25
+ - Natively multimodal, trained with interleaved image and text data
26
+ - Supports variable image sizes
27
+ - Leading performance in its weight class on multimodal tasks
28
+ - Maintains state-of-the-art performance on text-only benchmarks
29
+
30
+ ## Conversion Details
31
+
32
+ This repository provides the original Pixtral model weights converted to be fully compatible with the HuggingFace Transformers library. The conversion process ensures:
33
+
34
+ - Seamless loading using `from_pretrained()`
35
+ - Full compatibility with HuggingFace Transformers pipeline
36
+ - No modifications to the original model weights or architecture
37
+
38
+ ## Installation
39
+
40
+ You can install the model using the Transformers library:
41
+
42
+ ```python
43
+ from transformers import AutoProcessor, LLavaForConditionalGeneration
44
+ import torch
45
+
46
+ model = LLavaForConditionalGeneration.from_pretrained("your-username/pixtral-12b-2409", torch_dtype=torch.float16, device_map="auto")
47
+ processor = AutoProcessor.from_pretrained("your-username/pixtral-12b-2409")
48
+ ```
49
+
50
+ ## Example Usage
51
+
52
+ ```python
53
+ from PIL import Image
54
+ import requests
55
+
56
+ # Load an image
57
+ url = "https://example.com/sample-image.jpg"
58
+ image = Image.open(requests.get(url, stream=True).raw)
59
+
60
+ # Prepare conversation
61
+ conversation = [
62
+ {
63
+ "role": "user",
64
+ "content": [
65
+ {"type": "image"},
66
+ {"type": "text", "text": "What is shown in this image?"},
67
+ ],
68
+ }
69
+ ]
70
+
71
+ # Process and generate
72
+ prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
73
+ inputs = processor(images=[image], text=prompt, return_tensors="pt")
74
+ generate_ids = model.generate(**inputs, max_new_tokens=30)
75
+ response = processor.batch_decode(generate_ids, skip_special_tokens=True)
76
+ ```
77
+
78
+ ## Performance Benchmarks
79
+
80
+ ### Multimodal Benchmarks
81
+
82
+ | Benchmark | Pixtral 12B | Qwen2 7B VL | LLaVA-OV 7B | Phi-3 Vision |
83
+ |-----------|-------------|-------------|-------------|--------------|
84
+ | MMMU (CoT) | 52.5 | 47.6 | 45.1 | 40.3 |
85
+ | Mathvista (CoT) | 58.0 | 54.4 | 36.1 | 36.4 |
86
+ | ChartQA (CoT) | 81.8 | 38.6 | 67.1 | 72.0 |
87
+
88
+ *(Full benchmark details available in the original model card)*
89
+
90
+ ## Acknowledgements
91
+
92
+ A huge thank you to the Mistral team for creating and releasing the original Pixtral model.
93
+
94
+ ## Citation
95
+
96
+ If you use this model, please cite the original Mistral AI research.
97
+
98
+ ## License
99
+
100
+ This model is distributed under the Apache 2.0 License.
101
+
102
+ ## Original Model Card
103
+
104
+ For more comprehensive details, please refer to the [original Mistral model card](https://huggingface.co/mistralai/Pixtral-12B-2409).