GoodBaiBai88
/

M3D-CLIP

Image Feature Extraction

feature-extraction

3D medical CLIP

Image-text retrieval

Model card Files Files and versions Community

GoodBaiBai88 commited on Apr 29

Commit

6bcfee4

•

1 Parent(s): 988ef80

Update README.md

Files changed (1) hide show

README.md +21 -3

README.md CHANGED Viewed

@@ -12,6 +12,9 @@ M3D-CLIP is a 3D medical CLIP model, which aligns vision and language through co
 The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
 The text encoder utilizes a pre-trained BERT as initialization.
 # Quickstart
 ```python
@@ -30,15 +33,15 @@ The text encoder utilizes a pre-trained BERT as initialization.
     model = model.to(device=device)
     # Prepare your 3D medical image:
-    # 1. The image shape needs to be processed as 1*32*256*256, consider resize and other methods.
-    # 2. The image needs to be normalized to 0-1, consider Min-Max Normalization.
     # 3. The image format needs to be converted to .npy
     # 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
     image_path = ""
     input_txt = ""
-    text_tensor = tokenizer(input_txt, return_tensors="pt")
     input_id = text_tensor["input_ids"].to(device=device)
     attention_mask = text_tensor["attention_mask"].to(device=device)
     image = np.load(image_path).to(device=device)
@@ -49,3 +52,18 @@ The text encoder utilizes a pre-trained BERT as initialization.
 ```

 The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
 The text encoder utilizes a pre-trained BERT as initialization.
+![M3D_CLIP_table]([M3D_CLIP_table.png]#pic_center)
+![itr_result]([itr_result.png]#pic_center)
 # Quickstart
 ```python
     model = model.to(device=device)
     # Prepare your 3D medical image:
+    # 1. The image shape needs to be processed as 1*32*256*256, considering resize and other methods.
+    # 2. The image needs to be normalized to 0-1, considering Min-Max Normalization.
     # 3. The image format needs to be converted to .npy
     # 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
     image_path = ""
     input_txt = ""
+    text_tensor = tokenizer(input_txt, max_length=512, truncation=True, padding="max_length", return_tensors="pt")
     input_id = text_tensor["input_ids"].to(device=device)
     attention_mask = text_tensor["attention_mask"].to(device=device)
     image = np.load(image_path).to(device=device)
 ```
+# Citation
+If you feel helpful from our work, please consider citing the following work:
+```BibTeX
+@misc{bai2024m3d,
+      title={M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models},
+      author={Fan Bai and Yuxin Du and Tiejun Huang and Max Q. -H. Meng and Bo Zhao},
+      year={2024},
+      eprint={2404.00578},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```