Text Generation
Transformers
PyTorch
Safetensors
English
llama
finance
Eval Results
text-generation-inference
Inference Endpoints

Question on hardware requirement

#2
by adamchen564 - opened

Thank you for the amazing work!

(Q1) May I know how do I find out the hardware requirement to fine tune different models on hugging face? I suppose one way is to look at the "Files and versions" for the total memory required, and this finance-chat model is about 30GB. Is this a right way to look at the hardware requirement?

(Q2) when I run "model = AutoModelForCausalLM.from_pretrained("AdaptLLM/finance-chat") ", it shows "loading shards, 00:00 < 30:00", which means it takes 30mins to download, but after awhile the download will stop and break. I suspect that it is because my local device is not enough to download it, if so, is there a smaller version of Finance-Chat that I can download?. I suspect so because I am able to download another model on hugging face that takes only 3 mins to download.

Owner

Hi, thanks for your interest in our model.

  1. Yes, I think your way to look at the hardware requirement is right.
  2. Regarding the download duration issue, apart from hardware limitations, network instability could indeed cause interruptions during the download process. Ensuring a stable network connection might help alleviate this problem.
  3. There do exist smaller versions of Finance Chat models, which are quantized by other organizations, one such example is this: https://huggingface.co/TheBloke/finance-chat-AWQ.

Hi, we just updated the "safetensor" version of our model, which is much faster to download. You can also use "model = AutoModelForCausalLM.from_pretrained("AdaptLLM/finance-chat")" to download our model in this "faster" version.

Sign up or log in to comment