|
--- |
|
license: gpl-3.0 |
|
language: |
|
- en |
|
- zh |
|
- ja |
|
- de |
|
datasets: |
|
- JosephusCheung/GuanacoDataset |
|
- meta-math/MetaMathQA |
|
- jondurbin/airoboros-3.1 |
|
- WizardLM/WizardLM_evol_instruct_V2_196k |
|
- RyokoAI/ShareGPT52K |
|
- RyokoAI/Fandom23K |
|
- milashkaarshif/MoeGirlPedia_wikitext_raw_archive |
|
- wikipedia |
|
- wiki_lingua |
|
- garage-bAInd/Open-Platypus |
|
- LDJnr/Puffin |
|
- BAAI/COIG |
|
- TigerResearch/tigerbot-zhihu-zh-10k |
|
- liwu/MNBVC |
|
- teknium/openhermes |
|
- CausalLM/Refined-Anime-Text |
|
- microsoft/orca-math-word-problems-200k |
|
- m-a-p/CodeFeedback-Filtered-Instruction |
|
--- |
|
Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch |
|
|
|
For another candidate version of 1 epoch - https://huggingface.co/CausalLM/35b-beta - somehow less overfitting? |
|
|
|
No loras, no quants, no tricks. |
|
|
|
This one is not "very 128k", use https://huggingface.co/CausalLM/35b-beta-long for long context. But better in general tasks, knowledge, coding and so on. |
|
|
|
And, merge them if you want! |