vidore
/

colpali-v1.1

Visual Document Retrieval

Model card Files Files and versions

manu commited on Aug 21, 2024

Commit

7c446a5

·

verified ·

1 Parent(s): 730dded

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 license: mit
 library_name: colpali
 language:
 - en
 tags:
@@ -13,6 +14,8 @@ It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) exten
 It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
 This version has right padding to fix unwanted tokens in the query encoding.
 ## Model Description
@@ -58,8 +61,8 @@ def main() -> None:
     """Example script to run inference with ColPali"""
     # Load model
-    model_name = "vidore/colpali"
-    model = ColPali.from_pretrained("google/paligemma-3b-mix-448", torch_dtype=torch.bfloat16, device_map="cuda").eval()
     model.load_adapter(model_name)
     processor = AutoProcessor.from_pretrained(model_name)

 ---
 license: mit
 library_name: colpali
+base_model: vidore/colpaligemma-3b-mix-448-base
 language:
 - en
 tags:
 It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
 This version has right padding to fix unwanted tokens in the query encoding.
+It also stems from the fixed `vidore/colpaligemma-3b-mix-448-base` to guarantee deterministic projection layer initialization.
 ## Model Description
     """Example script to run inference with ColPali"""
     # Load model
+    model_name = "manu/colpali-v1.1"
+    model = ColPali.from_pretrained("vidore/colpaligemma-3b-mix-448-base", torch_dtype=torch.bfloat16, device_map="cuda").eval()
     model.load_adapter(model_name)
     processor = AutoProcessor.from_pretrained(model_name)