|
--- |
|
title: Dadc |
|
emoji: 🏢 |
|
colorFrom: red |
|
colorTo: gray |
|
sdk: gradio |
|
sdk_version: 3.0.17 |
|
app_file: app.py |
|
pinned: false |
|
license: bigscience-bloom-rail-1.0 |
|
--- |
|
|
|
A basic example of dynamic adversarial data collection with a Gradio app. |
|
|
|
*Instructions for someone to use for their own project:* |
|
|
|
**Setting up the Space** |
|
1. Clone this repo and deploy it on your own Hugging Face space. |
|
2. Add one of your Hugging Face tokens to the secrets for your space, with the |
|
name `HF_TOKEN`. Now, create an empty Hugging Face dataset on the hub. Put |
|
the url of this dataset in the secrets for your space, with the name |
|
`DATASET_REPO_URL`. It can be a private or public dataset. When you run this |
|
space on mturk in the following lines, the app will use your token to |
|
automatically store new hits to your dataset. |
|
|
|
**Running Data Collection** |
|
1. On your local repo that you pulled, create a copy of `config.py.example`, |
|
just called `config.py`. Now, put keys from your AWS account in `config.py`. |
|
These keys should be for an AWS account that has the |
|
AmazonMechanicalTurkFullAccess permission. You also need to |
|
create an mturk requestor account associated with your AWS account. |
|
2. Run `python collect.py` locally. If you run it with the `--live_mode` flag, |
|
it launches HITs on mturk, using the app you deployed on the space as the |
|
data collection UI and backend. NOTE: this means that you will need to pay |
|
real workers. If you don't use the `--live_mode` flag, then it will run the |
|
HITs on mturk sandbox, which is identical to the normal mturk, but just for |
|
testing. You can create a worker account and go to the sandbox version to |
|
test your HIT. |
|
|
|
**Profit** |
|
Now, you should be watching hits come into your Hugging Face dataset |
|
automatically! |
|
|
|
|