Inference speed
#5
by
omarabb315
- opened
I tried to run the model on T4 gpu but couldn’t get the output in less than 0.5 seconds even though the output tokens is short (about 20 to 30 tokens)
I tried to run the model on T4 gpu but couldn’t get the output in less than 0.5 seconds even though the output tokens is short (about 20 to 30 tokens)