xinyanghuang
/

Basic-Visual-Language-Model

Visual Question Answering

Model card Files Files and versions

Basic-Visual-Language-Model / README_zh.md

xinyanghuang's picture

Update README_zh.md

4766004 verified about 1 year ago

|

history blame contribute delete

888 Bytes

	# 从零搭建自己的多模态大模型

	For the English version of the README, please refer to [README.md](README.md).

	## 模型架构 🤖

	在 VLM 中，视觉部分采用已经实现初步语义对齐的 `CLIP` 或 `SIGLIP` 模型，并使用两层 MLP 进行特征映射。通过重写 `QWenModel` 的 `forward` 方法，将对应的 `image` 标记替换为视觉特征。

	## GitHub仓库 🏠

	具体地运行代码放在[Basic-Visual-Language-Model](https://github.com/xinyanghuang7/Basic-Visual-Language-Model/tree/main)。

	## 参考 📚

	感谢以下项目的伟大工作🙌：

	- https://github.com/WatchTower-Liu/VLM-learning/tree/main
	- https://github.com/QwenLM/Qwen
	- https://github.com/haotian-liu/LLaVA

	## 联系 ✉

	如果你有任何疑问或者想法，十分欢迎随时联系我😊：

	[email protected]

	我会在看到邮件的第一时间回复！