What is the inference time? Any ideas how to make it faster?

#52

by leoapolonio - opened Jun 8, 2023

Jun 8, 2023

I have it deployed using g5.48xlarge (which uses A10s under the hood). And I see more than 60s to generate 500 tokens.

Any suggested paths to make it faster?

Technology Innovation Institute org Jun 9, 2023

We would recommend using Text Generation Inference for optimal performance. On AWS SageMaker, have a look also at this blog.

FalconLLM changed discussion status to closed Jun 9, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment