Update README.md
Browse files
README.md
CHANGED
@@ -36,6 +36,10 @@ We introduce **Emu3**, a new suite of state-of-the-art multimodal models trained
|
|
36 |
- **Emu3** simply generates a video causally by predicting the next token in a video sequence, unlike the video diffusion model as in Sora. With a video in context, Emu3 can also naturally extend the video and predict what will happen next.
|
37 |
|
38 |
|
|
|
|
|
|
|
|
|
39 |
|
40 |
#### Quickstart
|
41 |
|
|
|
36 |
- **Emu3** simply generates a video causally by predicting the next token in a video sequence, unlike the video diffusion model as in Sora. With a video in context, Emu3 can also naturally extend the video and predict what will happen next.
|
37 |
|
38 |
|
39 |
+
### Model Information
|
40 |
+
|
41 |
+
The **Emu3-Stage1** model is the pre-trained weights of the first stage of the pre-training process of Emu3. The pre-training process of Emu3 is conducted in two stages. In the first stage, **which does not utilize video data**, training begins from scratch with a context length of 5120 for text and image data. The model supports image captioning and can generate images at a resolution of 512x512. You can use our [training scripts](https://github.com/baaivision/Emu3/tree/main/scripts) for further instruction tuning for **image generation and perception tasks**.
|
42 |
+
|
43 |
|
44 |
#### Quickstart
|
45 |
|