teowu commited on
Commit
9245dae
1 Parent(s): f595320

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -34,9 +34,9 @@ This checkpoint is one of base models of [Aria](https://huggingface.co/rhymes-ai
34
  ## Aria-Base-8K
35
 
36
  - **Base Model After Pre-training**: This model corresponds to the model checkpoint after the multimodal pre-training stage, with 1.4T tokens (1T language + 400B multimodal) trained in this stage. This stage lasts 43,000 iterations, with all sequences packed to 8192 with Megatron-LM, with global batch size 4096. During this training stage, the learning rate decays from `8.75e-5` to `3.5e-5`.
37
- - **Appropriate for Continue Pre-training**: This model is recommended for continue pre-training, *e.g.* on domain-specific pre-training data (OCR, agent, multi-lingual), while the targeted scenario does not involve long-context inputs. Please consider fine-tuning [Aria-Base-64K](https://huggingface.co/teowu/Aria-Base-64K) for long-context scenarios.
38
  - **Strong Base Performance on Language and Multimodal Scenarios**: This model shows excellent base performance on knowledge-related evaluations on both pure language and multimodal scenarios (MMLU 70+, MMMU 50+, *etc*).
39
- - ***Limited Ability on Long-context Scenarios***: This model is only trained with 8K context length, and is not expected to show best performance with context length especially longer than 8K (e.g. a video with >100 frames). [Aria-Base-64K](https://huggingface.co/teowu/Aria-Base-64K) is more appropriate for longer sequence understanding.
40
  - ***Limited Chat Template Availability***: This model is trained with a very low percentage of data (around 3%) re-formatted with the chat template. Hence, it might not be optimal to be directly used for chatting.
41
 
42
 
@@ -68,7 +68,7 @@ import torch
68
  from PIL import Image
69
  from transformers import AutoModelForCausalLM, AutoProcessor
70
 
71
- model_id_or_path = "teowu/Aria-Base-8K"
72
 
73
  model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
74
 
 
34
  ## Aria-Base-8K
35
 
36
  - **Base Model After Pre-training**: This model corresponds to the model checkpoint after the multimodal pre-training stage, with 1.4T tokens (1T language + 400B multimodal) trained in this stage. This stage lasts 43,000 iterations, with all sequences packed to 8192 with Megatron-LM, with global batch size 4096. During this training stage, the learning rate decays from `8.75e-5` to `3.5e-5`.
37
+ - **Appropriate for Continue Pre-training**: This model is recommended for continue pre-training, *e.g.* on domain-specific pre-training data (OCR, agent, multi-lingual), while the targeted scenario does not involve long-context inputs. Please consider fine-tuning [Aria-Base-64K](https://huggingface.co/rhymes-ai/Aria-Base-64K) for long-context scenarios.
38
  - **Strong Base Performance on Language and Multimodal Scenarios**: This model shows excellent base performance on knowledge-related evaluations on both pure language and multimodal scenarios (MMLU 70+, MMMU 50+, *etc*).
39
+ - ***Limited Ability on Long-context Scenarios***: This model is only trained with 8K context length, and is not expected to show best performance with context length especially longer than 8K (e.g. a video with >100 frames). [Aria-Base-64K](https://huggingface.co/rhymes-ai/Aria-Base-64K) is more appropriate for longer sequence understanding.
40
  - ***Limited Chat Template Availability***: This model is trained with a very low percentage of data (around 3%) re-formatted with the chat template. Hence, it might not be optimal to be directly used for chatting.
41
 
42
 
 
68
  from PIL import Image
69
  from transformers import AutoModelForCausalLM, AutoProcessor
70
 
71
+ model_id_or_path = "rhymes-ai/Aria-Base-8K"
72
 
73
  model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
74