Suggested Architecture for Small Mistral Model

#66

by mnitin73 - opened Oct 22, 2023

Oct 22, 2023

I want to pretrain a Model on a specific dataset from scratch. However, I only have access to a A100 80GB GPU. Can someone suggest a model architecture which can train on this GPU? I have tried the echarlaix/tiny-random-mistral and the illuin/tiny-random-MistralForCausalLM models. They work great but they are too tiny and simple for my requirement. I would like a slightly larger model architecture which gives better performance for my dataset.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment