my_gradio / guides /10_other-tutorials /running-background-tasks.md
xray918's picture
Upload folder using huggingface_hub
0ad74ed verified
# Running Background Tasks
Related spaces: https://huggingface.co/spaces/freddyaboulton/gradio-google-forms
Tags: TASKS, SCHEDULED, TABULAR, DATA
## Introduction
This guide explains how you can run background tasks from your gradio app.
Background tasks are operations that you'd like to perform outside the request-response
lifecycle of your app either once or on a periodic schedule.
Examples of background tasks include periodically synchronizing data to an external database or
sending a report of model predictions via email.
## Overview
We will be creating a simple "Google-forms-style" application to gather feedback from users of the gradio library.
We will use a local sqlite database to store our data, but we will periodically synchronize the state of the database
with a [HuggingFace Dataset](https://huggingface.co/datasets) so that our user reviews are always backed up.
The synchronization will happen in a background task running every 60 seconds.
At the end of the demo, you'll have a fully working application like this one:
<gradio-app space="freddyaboulton/gradio-google-forms"> </gradio-app>
## Step 1 - Write your database logic 💾
Our application will store the name of the reviewer, their rating of gradio on a scale of 1 to 5, as well as
any comments they want to share about the library. Let's write some code that creates a database table to
store this data. We'll also write some functions to insert a review into that table and fetch the latest 10 reviews.
We're going to use the `sqlite3` library to connect to our sqlite database but gradio will work with any library.
The code will look like this:
```python
DB_FILE = "./reviews.db"
db = sqlite3.connect(DB_FILE)
# Create table if it doesn't already exist
try:
db.execute("SELECT * FROM reviews").fetchall()
db.close()
except sqlite3.OperationalError:
db.execute(
'''
CREATE TABLE reviews (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
name TEXT, review INTEGER, comments TEXT)
''')
db.commit()
db.close()
def get_latest_reviews(db: sqlite3.Connection):
reviews = db.execute("SELECT * FROM reviews ORDER BY id DESC limit 10").fetchall()
total_reviews = db.execute("Select COUNT(id) from reviews").fetchone()[0]
reviews = pd.DataFrame(reviews, columns=["id", "date_created", "name", "review", "comments"])
return reviews, total_reviews
def add_review(name: str, review: int, comments: str):
db = sqlite3.connect(DB_FILE)
cursor = db.cursor()
cursor.execute("INSERT INTO reviews(name, review, comments) VALUES(?,?,?)", [name, review, comments])
db.commit()
reviews, total_reviews = get_latest_reviews(db)
db.close()
return reviews, total_reviews
```
Let's also write a function to load the latest reviews when the gradio application loads:
```python
def load_data():
db = sqlite3.connect(DB_FILE)
reviews, total_reviews = get_latest_reviews(db)
db.close()
return reviews, total_reviews
```
## Step 2 - Create a gradio app ⚡
Now that we have our database logic defined, we can use gradio create a dynamic web page to ask our users for feedback!
```python
with gr.Blocks() as demo:
with gr.Row():
with gr.Column():
name = gr.Textbox(label="Name", placeholder="What is your name?")
review = gr.Radio(label="How satisfied are you with using gradio?", choices=[1, 2, 3, 4, 5])
comments = gr.Textbox(label="Comments", lines=10, placeholder="Do you have any feedback on gradio?")
submit = gr.Button(value="Submit Feedback")
with gr.Column():
data = gr.Dataframe(label="Most recently created 10 rows")
count = gr.Number(label="Total number of reviews")
submit.click(add_review, [name, review, comments], [data, count])
demo.load(load_data, None, [data, count])
```
## Step 3 - Synchronize with HuggingFace Datasets 🤗
We could call `demo.launch()` after step 2 and have a fully functioning application. However,
our data would be stored locally on our machine. If the sqlite file were accidentally deleted, we'd lose all of our reviews!
Let's back up our data to a dataset on the HuggingFace hub.
Create a dataset [here](https://huggingface.co/datasets) before proceeding.
Now at the **top** of our script, we'll use the [huggingface hub client library](https://huggingface.co/docs/huggingface_hub/index)
to connect to our dataset and pull the latest backup.
```python
TOKEN = os.environ.get('HUB_TOKEN')
repo = huggingface_hub.Repository(
local_dir="data",
repo_type="dataset",
clone_from="<name-of-your-dataset>",
use_auth_token=TOKEN
)
repo.git_pull()
shutil.copyfile("./data/reviews.db", DB_FILE)
```
Note that you'll have to get an access token from the "Settings" tab of your HuggingFace for the above code to work.
In the script, the token is securely accessed via an environment variable.
![access_token](https://github.com/gradio-app/gradio/blob/main/guides/assets/access_token.png?raw=true)
Now we will create a background task to synch our local database to the dataset hub every 60 seconds.
We will use the [AdvancedPythonScheduler](https://apscheduler.readthedocs.io/en/3.x/) to handle the scheduling.
However, this is not the only task scheduling library available. Feel free to use whatever you are comfortable with.
The function to back up our data will look like this:
```python
from apscheduler.schedulers.background import BackgroundScheduler
def backup_db():
shutil.copyfile(DB_FILE, "./data/reviews.db")
db = sqlite3.connect(DB_FILE)
reviews = db.execute("SELECT * FROM reviews").fetchall()
pd.DataFrame(reviews).to_csv("./data/reviews.csv", index=False)
print("updating db")
repo.push_to_hub(blocking=False, commit_message=f"Updating data at {datetime.datetime.now()}")
scheduler = BackgroundScheduler()
scheduler.add_job(func=backup_db, trigger="interval", seconds=60)
scheduler.start()
```
## Step 4 (Bonus) - Deployment to HuggingFace Spaces
You can use the HuggingFace [Spaces](https://huggingface.co/spaces) platform to deploy this application for free ✨
If you haven't used Spaces before, follow the previous guide [here](/using_hugging_face_integrations).
You will have to use the `HUB_TOKEN` environment variable as a secret in the Guides.
## Conclusion
Congratulations! You know how to run background tasks from your gradio app on a schedule ⏲️.
Checkout the application running on Spaces [here](https://huggingface.co/spaces/freddyaboulton/gradio-google-forms).
The complete code is [here](https://huggingface.co/spaces/freddyaboulton/gradio-google-forms/blob/main/app.py)