Convert to ggml format

#1
by iguana0335 - opened

Hello. I would like to transform this model to ggml format to be able to test it on a machine with lower resources. Is there any procedure or scripts to do it? Thank you.

BERTIN Project org
edited Apr 12, 2023

I tried to convert it to ggml but had some errors due to a missing added_tokens.json in the tokenizer since the original model uses GPT2's instead of GPT-J-6B's. I just pushed a couple of versions, in float16 and float32, so the smaller model in float16 should be able to run in CPU using https://github.com/ggerganov/ggml/tree/master/examples/gpt-j. Just remember that the prompt needs to follow the Spanish version of the Alpaca prompt. For example, you could create a prompt.txt file:

A continuación hay una instrucción que describe una tarea. Escribe una respuesta que complete adecuadamente lo que se pide.

### Instrucción:
$TASK

### Respuesta:

Download the model, compile and install ggml, and then invoke it:

$ TASK="Escribe un email poniendo una excusa para faltar a la reunión" envsubst < prompt.txt | ./bin/gpt-j -m /path/to/bertin-gpt-j-6B-alpaca/ggml-model-f16.bin

Hello. I'm testing the ggml-model-f16.bin, and I have this "promp.txt" file :

A continuación hay una instrucción que describe una tarea. Escribe una respuesta que complete adecuadamente lo que se pide.

### Instrucción:
"Cuentame un cuento."

### Respuesta:

But when I launch the model there are many "gpt_tokenize: unknown token" and the text generate is not good:

$ cat prompt.txt | ./gpt-j -m /bertin-gpt-j-6b-alpaca/ggml-bertin-gpt-j-6B-alpaca/ggml-model-f16.bin
main: seed = 1681503060
gptj_model_load: loading model from '/bertin-gpt-j-6b-alpaca/ggml-bertin-gpt-j-6B-alpaca/ggml-model-f16.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 1
gptj_model_load: ggml ctx size = 12438.86 MB
gptj_model_load: memory_size = 896.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size = 11542.79 MB / num tensors = 285
gpt_tokenize: unknown token '
'
gpt_tokenize: unknown token 'A'
gpt_tokenize: unknown token 'a'
gpt_tokenize: unknown token 'c'
gpt_tokenize: unknown token 'i'
gpt_tokenize: unknown token '▒'
...
...
gpt_tokenize: unknown token ':'
'pt_tokenize: unknown token '
main: number of tokens in prompt = 32

continu hayna instrcci que describena tar Escibena respesta que complete aduaamen que se pid### InstrcciCuta cue."###RespestaCuá▒ es el cle fist x si 4▒ - 6 = 18▒▒▒▒### knee▒estaEl cle fist x es 6▒<|endoftext|>

main: mem per token = 15488240 bytes
main: load time = 18915.16 ms
main: sample time = 98.78 ms
main: predict time = 17135.78 ms / 267.75 ms per token
main: total time = 43800.53 ms

I'm probably doing something wrong or I need to configure something...
Thank you.

BERTIN Project org
edited Apr 20, 2023

Strange. Just run the same prompt and got this

$ TASK="Cuéntame un cuento" envsubst < es.txt | ./gpt-j -m /../bertin-gpt-j-6B-alpaca/ggml-model-f16.bin 
main: seed = 1681808791
gptj_model_load: loading model from '/../bertin-gpt-j-6B-alpaca/ggml-model-f16.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 1
gptj_model_load: ggml ctx size = 12438.86 MB
gptj_model_load: memory_size =   896.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size = 11542.79 MB / num tensors = 285
main: number of tokens in prompt = 73


A continuación hay una instrucción que describe una tarea. Escribe una respuesta que complete adecuadamente lo que se pide.

### Instrucción:
Cuéntame un cuento

### Respuesta:
Había una vez un pequeño ratón de campo que vivía en un pequeño agujero en el tronco de un árbol alto. El ratón era muy astuto, y a menudo se le veía resolviendo problemas para los demás animales del bosque. Un día, el ratón escuchó hablar de un gran concurso en el que se ofrecía una gran recompensa al mejor cuentacuentos. El ratón decidió participar. 

El ratón comenzó a practicar su cuento todos los días, pero ninguno de sus intentos parecía bueno. Un día, el ratón escuchó un ruido fuera del agujero en el tronco. Miró por el agujero y

main: mem per token = 15488240 bytes
main:     load time = 24641.92 ms
main:   sample time =    71.52 ms
main:  predict time = 90674.88 ms / 333.36 ms per token
main:    total time = 117546.53 ms

Maybe try downloaidng the model again? The commit hash of the gpt-j binary I'm building is 75824e76bd41818ff8902d244feec1ec6c1d2c86.

Hello.
The model that i have been downloaded has the same hash and size

$ certutil -hashfile ggml-model-f16.bin sha256
SHA256 hash de ggml-model-f16.bin:
8672f372dae6bb7660f070355adb44dbb35747746525192b89a24466aabebbb4

$ ls -l ggml-model-f16.bin
-rw-r--r-- 1 root root 12104027673 Apr 12 23:17 ggml-model-f16.bin

sha256_size_from_huggingface.jpg

Thank you.

BERTIN Project org
edited Apr 20, 2023

Just tried it again on a totally new VM from GCP using Ubuntu 20.04.01. It all went fine. I'm don't know how to reproduce. You get the same errors with the f32 model? Do you have an up to date version of ggml?

This are the steps I followed:

apt install cmake
git clone https://github.com/ggerganov/ggml
cd ggml/
mkdir build && cd build
cmake ..
make -j16 gpt-2 gpt-j
cd ..
# create prompt file es.txt
wget https://huggingface.co/bertin-project/bertin-gpt-j-6B-alpaca/resolve/main/ggml-model-f16.bin
TASK="Escribe un email poniendo una excusa para faltar a la reunión" envsubst < es.txt | ./build/bin/gpt-j -m ggml-model-f16.bin

Hello.
Yes, I'm having a problem with gpt-j.exe in my computer.
I'm using " https://github.com/LostRuins/koboldcpp " utility with your model, and it's working fine!!
Here is an example:

WithKoboldcpp.jpg

I'm trying to solve that problem with the generation of gpt-j.exe (no errors...but not working); I'll try in another PC.

Thank you very much!! :)

BERTIN Project org

Awesome! I'm closing the issue then :)

versae changed discussion status to closed

Es posible una cuantización a 4bits?

BERTIN Project org

@hwpoison89 yes, I believe it's possible now using ggml, but they had some issues in the past depending on how that quantization was performed.

Sign up or log in to comment