MLCD-Seg / README.md
killTheHostage's picture
Update a convenient way to use this model through Huggingface
b74e18c
|
raw
history blame
4.67 kB
metadata
license: apache-2.0
base_model:
  - DeepGlint-AI/MLCD-Embodied-7B

PWC PWC PWC PWC PWC PWC PWC PWC PWC

RefCOCO Segmentation Evaluation:

Dataset Split MLCD-seg-7B EVF-SAM GLaMM VisionLLM v2 LISA
RefCOCO val 83.6 82.4 79.5 79.2 74.9
RefCOCO testA 85.3 84.2 83.2 82.3 79.1
RefCOCO testB 81.5 80.2 76.9 77.0 72.3
RefCOCO+ val 79.4 76.5 72.6 68.9 65.1
RefCOCO+ testA 82.9 80.0 78.7 75.8 70.8
RefCOCO+ testB 75.6 71.9 64.6 61.8 58.1
RefCOCOg val 79.7 78.2 74.2 73.3 67.9
RefCOCOg test 80.5 78.3 74.9 74.8 70.6

Evaluation

model_path = "DeepGlint-AI/MLCD-Seg" # or use your local path
mlcd_seg = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
# Assuming you have an image named test.jpg
seg_img = Image.open("test.jpg").convert('RGB')
seg_prompt = "The <image> provides an overview of the picture.\nCould you provide a segmentation mask for the right giraffe in this image?"
pred_mask = model.predict_forward(seg_img, seg_prompt, tokenizer, force_seg=False)

Tips for updating this repo in the future

Huggingface uses cache management module code, so manual clearing of cache is required after repo update

cd ~/.cache/huggingface/modules/transformers_modules
rm mlcd_seg.py vision_projector.py vision_resampler.py vision_tower.py sam.py conversation_mlcd_seg.py

Citations

@misc{mlcdseg_wukun,
  author = {Wu, Kun and Xie, Yin and Zhou, Xinyu and An, Xiang, and Deng, Jiankang, and Jie, Yu},
  title = {MLCD-Seg},
  year = {2025},
  url = {https://github.com/deepglint/unicom/tree/main/downstream},
}