README.md · DeepGlint-AI/MLCD-Seg at b74e18cbab748af2d0d807e12e2fa167a0a25bf2

metadata

license: apache-2.0
base_model:
  - DeepGlint-AI/MLCD-Embodied-7B

RefCOCO Segmentation Evaluation:

Dataset	Split	MLCD-seg-7B	EVF-SAM	GLaMM	VisionLLM v2	LISA
RefCOCO	val	83.6	82.4	79.5	79.2	74.9
RefCOCO	testA	85.3	84.2	83.2	82.3	79.1
RefCOCO	testB	81.5	80.2	76.9	77.0	72.3
RefCOCO+	val	79.4	76.5	72.6	68.9	65.1
RefCOCO+	testA	82.9	80.0	78.7	75.8	70.8
RefCOCO+	testB	75.6	71.9	64.6	61.8	58.1
RefCOCOg	val	79.7	78.2	74.2	73.3	67.9
RefCOCOg	test	80.5	78.3	74.9	74.8	70.6

Evaluation

model_path = "DeepGlint-AI/MLCD-Seg" # or use your local path
mlcd_seg = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
# Assuming you have an image named test.jpg
seg_img = Image.open("test.jpg").convert('RGB')
seg_prompt = "The <image> provides an overview of the picture.\nCould you provide a segmentation mask for the right giraffe in this image?"
pred_mask = model.predict_forward(seg_img, seg_prompt, tokenizer, force_seg=False)

Tips for updating this repo in the future

Huggingface uses cache management module code, so manual clearing of cache is required after repo update

cd ~/.cache/huggingface/modules/transformers_modules
rm mlcd_seg.py vision_projector.py vision_resampler.py vision_tower.py sam.py conversation_mlcd_seg.py

Citations

@misc{mlcdseg_wukun,
  author = {Wu, Kun and Xie, Yin and Zhou, Xinyu and An, Xiang, and Deng, Jiankang, and Jie, Yu},
  title = {MLCD-Seg},
  year = {2025},
  url = {https://github.com/deepglint/unicom/tree/main/downstream},
}