README.md · BUT-FIT/csmpt7b at 7c9025dd7c48719146d25177cdeda177d07838ea

metadata

license: apache-2.0

Eval

Dev eval at CS-HellaSwag

Model	Model Accuracy
mistral7b	0.4992
csmpt-130k	0.5004
csmpt-100k	0.4959
csmpt-75k	0.4895
csmpt-50k steps	0.4755
csmpt-26.5k steps	0.4524

However, we ran validation on Hellaswag, and after 100k, the improvements were very noisy if any. The improvement over mistral7b is not significant.

How to setup environment

pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0

# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

### How to use in transformers
```python
import torch
import transformers
from transformers import pipeline

name = 'BUT-FIT/csmpt7b'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'flash'
config.init_device = 'cuda:0'  # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
    name,
    config=config,
    torch_dtype=torch.bfloat16,  # Load model weights in bfloat16
    trust_remote_code=True
)

tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)

pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Nejznámějším českým spisovatelem ',
             max_new_tokens=100,
             top_p=0.95,
             repetition_penalty=1.0,
             do_sample=True,
             use_cache=True))

Our Release Plan

Stage	Description	Date
1	'Best' model + training data	11.03.2024
2	All checkpoints + training code
3	Benczechmark a collection of Czech datasets for few-shot LLM evaluation

Stage 1: 'Best' model + training data.
Stage 2: All checkpoints + training code
Stage 3: Benczechmark a collection of Czech datasets. Get in touch if you'd like to know more and contribute!

Getting in Touch

For further questions, email to [email protected].

Disclaimer

This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.

Acknowledgement

This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT --- "Sémantický průzkumník textového kulturního dědictví" grant no. DH23P03OVV060 and by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254).