Spaces:
Runtime error
Runtime error
Feature: llama.cpp server | |
Background: Server startup | |
Given a server listening on localhost:8080 | |
And a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf | |
And a model file stories15M_MOE-F16.gguf | |
And a model alias stories15M_MOE | |
And a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf | |
And 42 as server seed | |
And 1024 as batch size | |
And 1024 as ubatch size | |
And 2048 KV cache size | |
And 64 max tokens to predict | |
And 0.0 temperature | |
Then the server is starting | |
Then the server is healthy | |
Scenario: Completion LoRA disabled | |
Given switch off lora adapter 0 | |
Given a prompt: | |
""" | |
Look in thy glass | |
""" | |
And a completion request with no api error | |
Then 64 tokens are predicted matching little|girl|three|years|old | |
Scenario: Completion LoRA enabled | |
Given switch on lora adapter 0 | |
Given a prompt: | |
""" | |
Look in thy glass | |
""" | |
And a completion request with no api error | |
Then 64 tokens are predicted matching eye|love|glass|sun | |