Image-Text-to-Text
Transformers
English
Inference Endpoints
File size: 427 Bytes
5b279df
 
 
 
 
 
 
 
 
 
33c548f
5b279df
f37d1fd
 
 
 
932eb8b
bf8033b
f37d1fd
8faa3cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: mit
datasets:
- liuhaotian/LLaVA-Instruct-150K
- LanguageBind/Video-LLaVA
language:
- en
metrics:
- accuracy
pipeline_tag: image-text-to-text
library_name: transformers
---
# LSTP-Chat: Language-guided Spatial-Temporal Prompt Learning for Video Chat

Available Models:

- LSTP-FlanT5xl
- LSTP-Chat-7B (Vicuna-7b)

For more details, please refer to our [official repository](https://github.com/bigai-nlco/LSTP-Chat)