Inference speed

by omarabb315 - opened Aug 23, 2024

Aug 23, 2024

I tried to run the model on T4 gpu but couldn’t get the output in less than 0.5 seconds even though the output tokens is short (about 20 to 30 tokens)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment