|
--- |
|
title: Dadc |
|
emoji: 🏢 |
|
colorFrom: red |
|
colorTo: gray |
|
sdk: gradio |
|
sdk_version: 3.0.17 |
|
app_file: app.py |
|
pinned: false |
|
license: bigscience-bloom-rail-1.0 |
|
--- |
|
|
|
A basic example of dynamic adversarial data collection with a Gradio app. |
|
|
|
**Instructions for someone to use for their own project:** |
|
|
|
*Setting up the Space* |
|
1. Clone this repo and deploy it on your own Hugging Face space. |
|
2. Add the following secrets to your space: |
|
- `HF_TOKEN`: One of your Hugging Face tokens. |
|
- `DATASET_REPO_URL`: The url to an empty dataset that you created the hub. It |
|
can be a private or public dataset. |
|
- `FORCE_PUSH`: "yes" |
|
When you run this space on mturk and when people visit your space on |
|
huggingface.co, the app will use your token to automatically store new HITs |
|
in your dataset. Setting `FORCE_PUSH` to "yes" ensures that your repo will |
|
force push changes to the dataset during data collection. Otherwise, |
|
accidental manual changes to your dataset could result in your space gettin |
|
merge conflicts as it automatically tries to push the dataset to the hub. For |
|
local development, add these three keys to a `.env` file, and consider setting |
|
`FORCE_PUSH` to "no". |
|
|
|
*Running Data Collection* |
|
1. On your local repo that you pulled, create a copy of `config.py.example`, |
|
just called `config.py`. Now, put keys from your AWS account in `config.py`. |
|
These keys should be for an AWS account that has the |
|
AmazonMechanicalTurkFullAccess permission. You also need to |
|
create an mturk requestor account associated with your AWS account. |
|
2. Run `python collect.py` locally. |
|
|
|
*Profit* |
|
Now, you should be watching hits come into your Hugging Face dataset |
|
automatically! |
|
|
|
*Tips and Tricks* |
|
- Use caution while doing local development of your space and |
|
simultaneously running it on mturk. Consider setting `FORCE_PUSH` to "no" in |
|
your local `.env` file. |
|
- huggingface spaces have limited computational resources and memory. If you |
|
run too many HITs and/or assignments at once, then you could encounter issues. |
|
You could also encounter issues if you are trying to create a dataset that is |
|
very large. Check the log of your space for any errors that could be happening. |
|
|
|
|