license: apache-2.0
Intruduction
Eval
Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark)
Model | Model Accuracy |
---|---|
mistral7b | 0.4992 |
csmpt-130k | 0.5004 |
csmpt-100k | 0.4959 |
csmpt-75k | 0.4895 |
csmpt-50k steps | 0.4755 |
csmpt-26.5k steps | 0.4524 |
However, we ran validation over the course of training on CS-Hellaswag, and after 100k steps, the improvements were very noisy if any. The improvement over mistral7b is not significant.
Loss
tbd.
Training Method
tbd.
Usage
How to Setup Environment
pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
Running the Code
import torch
import transformers
from transformers import pipeline
name = 'BUT-FIT/csmpt7b'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.init_device = 'cuda:0' # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16, # Load model weights in bfloat16
trust_remote_code=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
print(
pipe('Nejznámějším českým spisovatelem ',
max_new_tokens=100,
top_p=0.95,
repetition_penalty=1.0,
do_sample=True,
use_cache=True))
Training Data
We release most of our training data here [TBD MDocekal.].
Our Release Plan
Stage | Description | Date |
---|---|---|
1 | 'Best' model + training data | 11.03.2024 |
2 | All checkpoints + training code | |
3 | Benczechmark a collection of Czech datasets for few-shot LLM evaluation Get in touch if you want to contribute! | |
4 | Preprint Publication |
Getting in Touch
For further questions, email to [email protected]
.
Disclaimer
This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
Acknowledgement
This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT ---
"Sémantický průzkumník textového kulturního dědictví" grant no. DH23P03OVV060
and
by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254
).