misdelivery/Llama-3.1-Swallow-70B-Nemotron-Instruct-v0.1のQ4_K_S GGUF（テスト用）です。 imatrix量子化をすると性能が向上すると思います。

GGUF

Model size

70.6B params

Architecture

llama

4-bit

Inference API

Unable to determine this model's library. Check the docs .