Spaces:
Runtime error
Runtime error
# run with: ./tests.sh --no-skipped --tags wrong_usage | |
Feature: Wrong usage of llama.cpp server | |
#3969 The user must always set --n-predict option | |
# to cap the number of tokens any completion request can generate | |
# or pass n_predict/max_tokens in the request. | |
Scenario: Infinite loop | |
Given a server listening on localhost:8080 | |
And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models | |
And 42 as server seed | |
And 2048 KV cache size | |
# Uncomment below to fix the issue | |
#And 64 server max tokens to predict | |
Then the server is starting | |
Then the server is healthy | |
Given a prompt: | |
""" | |
Go to: infinite loop | |
""" | |
# Uncomment below to fix the issue | |
#And 128 max tokens to predict | |
Given concurrent completion requests | |
Then the server is idle | |
Then all prompts are predicted | |