zboyles commited on
Commit
18e08d3
Β·
verified Β·
1 Parent(s): 5224d01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -9
README.md CHANGED
@@ -3,7 +3,7 @@ base_model:
3
  - HuggingFaceTB/SmolVLM-256M-Instruct
4
  language:
5
  - en
6
- library_name: transformers
7
  license: apache-2.0
8
  pipeline_tag: image-text-to-text
9
  tags:
@@ -11,14 +11,161 @@ tags:
11
  ---
12
 
13
  # zboyles/SmolDocling-256M-preview-bf16
14
- This model was converted to MLX format from [`ds4sd/SmolDocling-256M-preview`](https://huggingface.co/ds4sd/SmolDocling-256M-preview) using mlx-vlm version **0.1.18**.
15
- Refer to the [original model card](https://huggingface.co/ds4sd/SmolDocling-256M-preview) for more details on the model.
16
- ## Use with mlx
17
 
18
- ```bash
19
- pip install -U mlx-vlm
20
- ```
21
 
22
- ```bash
23
- python -m mlx_vlm.generate --model zboyles/SmolDocling-256M-preview-bf16 --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```
 
 
 
 
 
3
  - HuggingFaceTB/SmolVLM-256M-Instruct
4
  language:
5
  - en
6
+ library_name: mlx
7
  license: apache-2.0
8
  pipeline_tag: image-text-to-text
9
  tags:
 
11
  ---
12
 
13
  # zboyles/SmolDocling-256M-preview-bf16
14
+ This model was converted to **MLX format** from [`ds4sd/SmolDocling-256M-preview`](https://huggingface.co/ds4sd/SmolDocling-256M-preview) using mlx-vlm version **0.1.18**.
15
+ * Refer to the [**original model card**](https://huggingface.co/ds4sd/SmolDocling-256M-preview) for more details on the model.
16
+ * Refer to the [**mlx-vlm repo**](https://github.com/Blaizzy/mlx-vlm) for more examples using `mlx-vlm`.
17
 
 
 
 
18
 
19
+ ## Use SmolDocling-256M-preview with with docling and mlx
20
+
21
+ > **Find Working MLX + Docling Example Code Below**
22
+
23
+
24
+ <div style="display: flex; align-items: center;">
25
+ <img src="https://huggingface.co/ds4sd/SmolDocling-256M-preview/resolve/main/assets/SmolDocling_doctags1.png" alt="SmolDocling" style="width: 200px; height: auto; margin-right: 20px;">
26
+ <div>
27
+ <h3>SmolDocling-256M-preview</h3>
28
+ <p>SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling's most popular features while ensuring full compatibility with Docling through seamless support for <strong>DoclingDocuments</strong>.</p>
29
+ </div>
30
+ </div>
31
+
32
+ This model was presented in the paper [SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion](https://huggingface.co/papers/2503.11576).
33
+
34
+ ### πŸš€ Features:
35
+ - 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
36
+ - πŸ” **OCR (Optical Character Recognition)** – Extracts text accurately from images.
37
+ - πŸ“ **Layout and Localization** – Preserves document structure and document element **bounding boxes**.
38
+ - πŸ’» **Code Recognition** – Detects and formats code blocks including identation.
39
+ - πŸ”’ **Formula Recognition** – Identifies and processes mathematical expressions.
40
+ - πŸ“Š **Chart Recognition** – Extracts and interprets chart data.
41
+ - πŸ“‘ **Table Recognition** – Supports column and row headers for structured table extraction.
42
+ - πŸ–ΌοΈ **Figure Classification** – Differentiates figures and graphical elements.
43
+ - πŸ“ **Caption Correspondence** – Links captions to relevant images and figures.
44
+ - πŸ“œ **List Grouping** – Organizes and structures list elements correctly.
45
+ - πŸ“„ **Full-Page Conversion** – Processes entire pages for comprehensive document conversion including all page elements (code, equations, tables, charts etc.)
46
+ - πŸ”² **OCR with Bounding Boxes** – OCR regions using a bounding box.
47
+ - πŸ“‚ **General Document Processing** – Trained for both scientific and non-scientific documents.
48
+ - πŸ”„ **Seamless Docling Integration** – Import into **Docling** and export in multiple formats.
49
+ - πŸ’¨ **Fast inference using VLLM** – Avg of 0.35 secs per page on A100 GPU.
50
+
51
+ ### 🚧 *Coming soon!*
52
+ - πŸ“Š **Better chart recognition πŸ› οΈ**
53
+ - πŸ“š **One shot multi-page inference ⏱️**
54
+ - πŸ§ͺ **Chemical Recognition**
55
+ - πŸ“™ **Datasets**
56
+
57
+ ## ⌨️ Get started (**MLX** code examples)
58
+
59
+ You can use **mlx** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert the results to a variety of ourput formats (md, html, etc.):
60
+
61
+ <details>
62
+ <summary>πŸ“„ Single page image inference using MLX via `mlx-vlm` πŸ€–</summary>
63
+
64
+ ```python
65
+ # Prerequisites:
66
+ # pip install -U mlx-vlm
67
+ # pip install docling_core
68
+
69
+ import sys
70
+
71
+ from pathlib import Path
72
+ from PIL import Image
73
+
74
+ from mlx_vlm import load, apply_chat_template, stream_generate
75
+ from mlx_vlm.utils import load_image
76
+
77
+ # Variables
78
+ path_or_hf_repo="zboyles/SmolDocling-256M-preview-bf16"
79
+ output_path=Path("output")
80
+ output_path.mkdir(exist_ok=True)
81
+
82
+ # Model Params
83
+ eos="<end_of_utterance>"
84
+ verbose=True
85
+ kwargs={
86
+ "max_tokens": 8000,
87
+ "temperature": 0.0,
88
+ }
89
+
90
+ # Load images
91
+ # Note: I manually downloaded the image
92
+ # image_src = "https://upload.wikimedia.org/wikipedia/commons/7/76/GazettedeFrance.jpg"
93
+ # image = load_image(image_src)
94
+ image_src = "images/GazettedeFrance.jpg"
95
+ image = Image.open(image_src).convert("RGB")
96
+
97
+ # Initialize processor and model
98
+ model, processor = load(
99
+ path_or_hf_repo=path_or_hf_repo,
100
+ trust_remote_code=True,
101
+ )
102
+ config = model.config
103
+
104
+
105
+ # Create input messages - Docling Walkthrough Structure
106
+ messages = [
107
+ {
108
+ "role": "user",
109
+ "content": [
110
+ {"type": "image"},
111
+ {"type": "text", "text": "Convert this page to docling."}
112
+ ]
113
+ },
114
+ ]
115
+ prompt = apply_chat_template(processor, config, messages, add_generation_prompt=True)
116
+
117
+ # # Alternatively, supported prompt creation method
118
+ # messages = [{"role": "user", "content": "Convert this page to docling."}]
119
+ # prompt = apply_chat_template(processor, config, messages, add_generation_prompt=True)
120
+
121
+
122
+ text = ""
123
+ last_response = None
124
+
125
+ for response in stream_generate(
126
+ model=model,
127
+ processor=processor,
128
+ prompt=prompt,
129
+ image=image,
130
+ **kwargs
131
+ ):
132
+ if verbose:
133
+ print(response.text, end="", flush=True)
134
+ text += response.text
135
+ last_response = response
136
+ if eos in text:
137
+ text = text.split(eos)[0].strip()
138
+ break
139
+ print()
140
+
141
+ if verbose:
142
+ print("\n" + "=" * 10)
143
+ if len(text) == 0:
144
+ print("No text generated for this prompt")
145
+ sys.exit(0)
146
+ print(
147
+ f"Prompt: {last_response.prompt_tokens} tokens, "
148
+ f"{last_response.prompt_tps:.3f} tokens-per-sec"
149
+ )
150
+ print(
151
+ f"Generation: {last_response.generation_tokens} tokens, "
152
+ f"{last_response.generation_tps:.3f} tokens-per-sec"
153
+ )
154
+ print(f"Peak memory: {last_response.peak_memory:.3f} GB")
155
+
156
+ # To convert to Docling Document, MD, HTML, etc.:
157
+ docling_output_path = output_path / Path(image_src).with_suffix(".dt").name
158
+ docling_output_path.write_text(text)
159
+ doctags_doc = DocTagsDocument.from_doctags_and_image_pairs([text], [image])
160
+ doc = DoclingDocument(name="Document")
161
+ doc.load_from_doctags(doctags_doc)
162
+ # export as any format
163
+ # HTML
164
+ doc.save_as_html(docling_output_path.with_suffix(".html"))
165
+ # MD
166
+ doc.save_as_markdown(docling_output_path.with_suffix(".md"))
167
  ```
168
+ </details>
169
+
170
+ Thanks to [**@Blaizzy**](https://github.com/Blaizzy) for the [code examples](https://github.com/Blaizzy/mlx-vlm/tree/main/examples) that helped me quickly adapt the `docling` example.
171
+