What is the inference time? Any ideas how to make it faster?
#52
by
leoapolonio
- opened
I have it deployed using g5.48xlarge (which uses A10s under the hood). And I see more than 60s to generate 500 tokens.
Any suggested paths to make it faster?
We would recommend using Text Generation Inference for optimal performance. On AWS SageMaker, have a look also at this blog.
FalconLLM
changed discussion status to
closed