HuggingFaceTB
/

SmolVLM-Instruct-DPO

Image-Text-to-Text

Model card Files Files and versions Community

kashif HF staff commited on Nov 26, 2024

Commit

a213c02

·

verified ·

1 Parent(s): 27e944c

Update README.md

Files changed (1) hide show

README.md +2 -4

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ tags:
 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM.png" width="800" height="auto" alt="Image description">
-# SmolVLM
 SmolVLM is a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs. Designed for efficiency, SmolVLM can answer questions about images, describe visual content, create stories grounded on multiple images, or function as a pure language model without visual inputs. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks.
@@ -65,7 +65,7 @@ SmolVLM is a compact open multimodal model that accepts arbitrary sequences of i
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
@@ -130,8 +130,6 @@ Use the code below to get started with the model.
 #### Summary
 ## Model Examination [optional]
 <!-- Relevant interpretability work for the model goes here -->

 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM.png" width="800" height="auto" alt="Image description">
+# SmolVLM-Instruct-DPO
 SmolVLM is a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs. Designed for efficiency, SmolVLM can answer questions about images, describe visual content, create stories grounded on multiple images, or function as a pure language model without visual inputs. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks.
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 #### Summary
 ## Model Examination [optional]
 <!-- Relevant interpretability work for the model goes here -->