the model is not optimize in term of inference

by Imran1 - opened Oct 17, 2024

Imran1

Oct 17, 2024

i load the model for inference using 2H100 gpu. but the model is very slow with Flash Attention.

Oct 17, 2024

Can you share your code?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment