xinyanghuang
/

Basic-Visual-Language-Model

Visual Question Answering

Model card Files Files and versions Community

Basic-Visual-Language-Model / README_zh.md

xinyanghuang's picture

Update README_zh.md

4766004 verified 10 months ago

|

history blame contribute delete

888 Bytes

从零搭建自己的多模态大模型

For the English version of the README, please refer to README.md.

模型架构 🤖

在 VLM 中，视觉部分采用已经实现初步语义对齐的 CLIP 或 SIGLIP 模型，并使用两层 MLP 进行特征映射。通过重写 QWenModel 的 forward 方法，将对应的 image 标记替换为视觉特征。

GitHub仓库 🏠

具体地运行代码放在Basic-Visual-Language-Model。

参考 📚

感谢以下项目的伟大工作🙌：

联系 ✉

如果你有任何疑问或者想法，十分欢迎随时联系我😊：

[email protected]

我会在看到邮件的第一时间回复！