Angelawork commited on
Commit
20924ee
·
1 Parent(s): 798ab47

README with notes

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -26,7 +26,7 @@ API_URL = "https://caddc612329739b198.gradio.live/"
26
  client = Client(API_URL)
27
  result = client.predict(
28
  model_typ="gemma",
29
- prompt="Hello!!",
30
  max_length=256,
31
  api_token="",
32
  api_name="/predict"
@@ -95,9 +95,9 @@ By producing multiple responses for a given input, this approach used various po
95
 
96
  ## Notes and Cautions
97
 
98
- 1. Rate Limits: The free Inference API may be rate-limited for heavy use cases. To avoid 503 errors, start with a low volume of requests and increase gradually.
99
 
100
  2. **Model Performance**: Inference times can vary.
101
- Note that the inference time for the Falcon model can be very long. It is recommended to use its API instead. Please remember to provide the corresponding HF Token that grants access to the API for each query, otherwise only a warning message will be shown.
102
 
103
  3. Model Limitations: Some models may produce different results or have limitations based on their configuration and the input provided. Note that a default prefix is prepended to every incoming query based on prompt testing for a suitable choice. You can configure these default values in globals.py
 
26
  client = Client(API_URL)
27
  result = client.predict(
28
  model_typ="gemma",
29
+ prompt="She has a heart of gold",
30
  max_length=256,
31
  api_token="",
32
  api_name="/predict"
 
95
 
96
  ## Notes and Cautions
97
 
98
+ 1. Rate Limits: The free Inference API may be rate limited for heavy use cases. We try to balance the loads evenly between all our available resources, and favoring steady flows of requests. If your account suddenly sends 10k requests then you’re likely to receive 503 errors saying models are loading. In order to prevent that, you should instead try to start running queries smoothly from 0 to 10k over the course of a few minutes.
99
 
100
  2. **Model Performance**: Inference times can vary.
101
+ The inference time for the Falcon-Instruct models are notably long. It took from 300 ~ up to 800 seconds. Using API is recommended (~0.6sec, under 1 sec in most cases). Please remember to provide the corresponding HF Token that grants access to the API for each query, otherwise only a warning message will be shown.
102
 
103
  3. Model Limitations: Some models may produce different results or have limitations based on their configuration and the input provided. Note that a default prefix is prepended to every incoming query based on prompt testing for a suitable choice. You can configure these default values in globals.py