File size: 4,290 Bytes
86503a5
 
 
158416f
 
 
 
 
 
 
 
 
 
86503a5
 
158416f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df8c4f4
8cfe9de
 
 
 
 
 
 
 
 
86503a5
e326aec
e84167c
 
e326aec
 
 
670d57a
86503a5
df8187b
 
86503a5
275b6ec
3019492
275b6ec
3019492
86503a5
275b6ec
3019492
 
 
 
 
 
 
 
86503a5
7884669
86503a5
275b6ec
 
 
86503a5
275b6ec
 
86503a5
275b6ec
 
 
 
86503a5
275b6ec
 
 
86503a5
b9a98c8
86503a5
275b6ec
86503a5
3408d82
275b6ec
86503a5
275b6ec
86503a5
275b6ec
86503a5
275b6ec
 
 
86503a5
275b6ec
86503a5
340d484
86503a5
275b6ec
 
86503a5
275b6ec
 
86503a5
275b6ec
 
 
 
86503a5
275b6ec
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
language:
- en
- zh
license: apache-2.0
library_name: transformers
tags:
- multimodal
- vqa
- text
- audio
datasets:
- synthetic-dataset
metrics:
- accuracy
- bleu
- wer
model-index:
- name: Evolutionary Multi-Modal Model
  results:
  - task:
      type: vqa
      name: Visual Question Answering
    dataset:
      type: synthetic-dataset
      name: Synthetic Multimodal Dataset
      split: test
    metrics:
    - type: accuracy
      value: 85
pipeline_tag: text-generation
widget:
- text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy."
  example_title: "Sentiment analysis"
- text: "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had ..."
  example_title: "Coreference resolution"
- text: "On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book ..."
  example_title: "Logic puzzles"
- text: "The two men running to become New York City's next mayor will face off in their first debate Wednesday night ..."
  example_title: "Reading comprehension"
---
### Model Sources
You need to use separate code, audio, text, and natural language together with the model. Because the model will use separate word segmenters and vocabularies to achieve the best results when dealing with special cases.
--

- **Repository:** [https://huggingface.co/zeroMN/SHMT](https://huggingface.co/zeroMN/SHMT)
- **kaggle:** [https://www.kaggle.com/models/zeroeva/evolutionary-multi-modal) (https://www.kaggle.com/models/zeroeva/evolutionary-multi-modal)
- **Demo:** [https://huggingface.co/spaces/zeroMN/zeroMN-SHMT](https://huggingface.co/spaces/zeroMN/zeroMN-SHMT) 

##     Multi-Modal Model
# Model Card for Evolutionary

### Model Description
--
This model, named `Evolutionary Multi-Modal Model`, is a multimodal transformer designed to handle a variety of tasks including vision and audio processing. It is built on top of the `adapter-transformers` and `transformers` libraries and is intended to be a versatile base model for both direct use and fine-tuning.
-

--
**Developed by:** Independent researcher
**Funded by :** Self-funded
**Shared by :** Independent researcher
**Model type:** Multimodal
**Language(s) (NLP):** English zh
**License:** Apache-2.0
**Finetuned from model :** None
-

## Uses:https://huggingface.co/zeroMN/SHMT

### Direct Use
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("zeroMN/SHMT")
tokenizer = AutoTokenizer.from_pretrained("zeroMN/SHMT")

input_text = "Tell me a joke."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)
```
### Downstream Use

The model can be fine-tuned for specific tasks such as visual question answering (VQA), image captioning, and audio recognition. 

### Out-of-Scope Use

The Evolved Multimodal Model is not suitable for tasks that require high expertise or domain-specific expertise beyond its current capabilities. The number of speech frames still needs to be fine-tuned by yourself.
## Bias, Risks, and Limitations

### Recommendations

Users (both direct and downstream) should be made aware of the following risks, biases, and limitations:

- **Bias:** The model may exhibit biases present in the training data, particularly if the data is not representative of all populations.
- **Risks:** The model should not be used in critical applications where high accuracy and reliability are required without thorough testing and validation.
- **Limitations:** The model may not perform well on tasks that require fine-grained recognition or highly specialized audio processing.

## How to Get Started with the Model

Use the code below to get started with the `Multi-Modal Model`

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("zeroMN/SHMT")
tokenizer = AutoTokenizer.from_pretrained("zeroMN/SHMT")

input_text = "Tell me a joke."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)
```