Spaces:
Sleeping
Sleeping
Angelawork
commited on
Commit
·
20924ee
1
Parent(s):
798ab47
README with notes
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ API_URL = "https://caddc612329739b198.gradio.live/"
|
|
26 |
client = Client(API_URL)
|
27 |
result = client.predict(
|
28 |
model_typ="gemma",
|
29 |
-
prompt="
|
30 |
max_length=256,
|
31 |
api_token="",
|
32 |
api_name="/predict"
|
@@ -95,9 +95,9 @@ By producing multiple responses for a given input, this approach used various po
|
|
95 |
|
96 |
## Notes and Cautions
|
97 |
|
98 |
-
1. Rate Limits: The free Inference API may be rate
|
99 |
|
100 |
2. **Model Performance**: Inference times can vary.
|
101 |
-
|
102 |
|
103 |
3. Model Limitations: Some models may produce different results or have limitations based on their configuration and the input provided. Note that a default prefix is prepended to every incoming query based on prompt testing for a suitable choice. You can configure these default values in globals.py
|
|
|
26 |
client = Client(API_URL)
|
27 |
result = client.predict(
|
28 |
model_typ="gemma",
|
29 |
+
prompt="She has a heart of gold",
|
30 |
max_length=256,
|
31 |
api_token="",
|
32 |
api_name="/predict"
|
|
|
95 |
|
96 |
## Notes and Cautions
|
97 |
|
98 |
+
1. Rate Limits: The free Inference API may be rate limited for heavy use cases. We try to balance the loads evenly between all our available resources, and favoring steady flows of requests. If your account suddenly sends 10k requests then you’re likely to receive 503 errors saying models are loading. In order to prevent that, you should instead try to start running queries smoothly from 0 to 10k over the course of a few minutes.
|
99 |
|
100 |
2. **Model Performance**: Inference times can vary.
|
101 |
+
The inference time for the Falcon-Instruct models are notably long. It took from 300 ~ up to 800 seconds. Using API is recommended (~0.6sec, under 1 sec in most cases). Please remember to provide the corresponding HF Token that grants access to the API for each query, otherwise only a warning message will be shown.
|
102 |
|
103 |
3. Model Limitations: Some models may produce different results or have limitations based on their configuration and the input provided. Note that a default prefix is prepended to every incoming query based on prompt testing for a suitable choice. You can configure these default values in globals.py
|