GoodBaiBai88 commited on
Commit
6bcfee4
1 Parent(s): 988ef80

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -12,6 +12,9 @@ M3D-CLIP is a 3D medical CLIP model, which aligns vision and language through co
12
  The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
13
  The text encoder utilizes a pre-trained BERT as initialization.
14
 
 
 
 
15
  # Quickstart
16
 
17
  ```python
@@ -30,15 +33,15 @@ The text encoder utilizes a pre-trained BERT as initialization.
30
  model = model.to(device=device)
31
 
32
  # Prepare your 3D medical image:
33
- # 1. The image shape needs to be processed as 1*32*256*256, consider resize and other methods.
34
- # 2. The image needs to be normalized to 0-1, consider Min-Max Normalization.
35
  # 3. The image format needs to be converted to .npy
36
  # 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
37
 
38
  image_path = ""
39
  input_txt = ""
40
 
41
- text_tensor = tokenizer(input_txt, return_tensors="pt")
42
  input_id = text_tensor["input_ids"].to(device=device)
43
  attention_mask = text_tensor["attention_mask"].to(device=device)
44
  image = np.load(image_path).to(device=device)
@@ -49,3 +52,18 @@ The text encoder utilizes a pre-trained BERT as initialization.
49
 
50
  ```
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  The vision encoder uses 3D ViT with 32*256*256 image size and 4*16*16 patch size.
13
  The text encoder utilizes a pre-trained BERT as initialization.
14
 
15
+ ![M3D_CLIP_table]([M3D_CLIP_table.png]#pic_center)
16
+ ![itr_result]([itr_result.png]#pic_center)
17
+
18
  # Quickstart
19
 
20
  ```python
 
33
  model = model.to(device=device)
34
 
35
  # Prepare your 3D medical image:
36
+ # 1. The image shape needs to be processed as 1*32*256*256, considering resize and other methods.
37
+ # 2. The image needs to be normalized to 0-1, considering Min-Max Normalization.
38
  # 3. The image format needs to be converted to .npy
39
  # 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
40
 
41
  image_path = ""
42
  input_txt = ""
43
 
44
+ text_tensor = tokenizer(input_txt, max_length=512, truncation=True, padding="max_length", return_tensors="pt")
45
  input_id = text_tensor["input_ids"].to(device=device)
46
  attention_mask = text_tensor["attention_mask"].to(device=device)
47
  image = np.load(image_path).to(device=device)
 
52
 
53
  ```
54
 
55
+ # Citation
56
+
57
+ If you feel helpful from our work, please consider citing the following work:
58
+
59
+ ```BibTeX
60
+ @misc{bai2024m3d,
61
+ title={M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models},
62
+ author={Fan Bai and Yuxin Du and Tiejun Huang and Max Q. -H. Meng and Bo Zhao},
63
+ year={2024},
64
+ eprint={2404.00578},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CV}
67
+ }
68
+ ```
69
+