Suggested Architecture for Small Mistral Model

#66
by mnitin73 - opened

I want to pretrain a Model on a specific dataset from scratch. However, I only have access to a A100 80GB GPU. Can someone suggest a model architecture which can train on this GPU? I have tried the echarlaix/tiny-random-mistral and the illuin/tiny-random-MistralForCausalLM models. They work great but they are too tiny and simple for my requirement. I would like a slightly larger model architecture which gives better performance for my dataset.

Sign up or log in to comment