Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -1,77 +1,77 @@
|
|
1 |
import streamlit as st
|
2 |
# %%
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
|
8 |
-
|
9 |
-
#
|
10 |
|
11 |
-
|
12 |
-
|
13 |
-
#
|
14 |
-
# Each dataset is unique, and depending on the task, some datasets may require additional steps to prepare it for training. But you can always use π€ Datasets tools to load and process a dataset. The fastest and easiest way to get started is by loading an existing dataset from the [Hugging Face Hub](https://huggingface.co/datasets). There are thousands of datasets to choose from, spanning many tasks. Choose the type of dataset you want to work with, and let's get started!
|
15 |
-
#
|
16 |
-
# <div class="mt-4">
|
17 |
-
# <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">
|
18 |
-
# <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#audio"
|
19 |
-
# ><div class="w-full text-center bg-gradient-to-r from-violet-300 via-sky-400 to-green-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Audio</div>
|
20 |
-
# <p class="text-gray-700">Resample an audio dataset and get it ready for a model to classify what type of banking issue a speaker is calling about.</p>
|
21 |
-
# </a>
|
22 |
-
# <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#vision"
|
23 |
-
# ><div class="w-full text-center bg-gradient-to-r from-pink-400 via-purple-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Vision</div>
|
24 |
-
# <p class="text-gray-700">Apply data augmentation to an image dataset and get it ready for a model to diagnose disease in bean plants.</p>
|
25 |
-
# </a>
|
26 |
-
# <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#nlp"
|
27 |
-
# ><div class="w-full text-center bg-gradient-to-r from-orange-300 via-red-400 to-violet-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">NLP</div>
|
28 |
-
# <p class="text-gray-700">Tokenize a dataset and get it ready for a model to determine whether a pair of sentences have the same meaning.</p>
|
29 |
-
# </a>
|
30 |
-
# </div>
|
31 |
-
# </div>
|
32 |
-
#
|
33 |
-
# <Tip>
|
34 |
-
#
|
35 |
-
# Check out [Chapter 5](https://huggingface.co/course/chapter5/1?fw=pt) of the Hugging Face course to learn more about other important topics such as loading remote or local datasets, tools for cleaning up a dataset, and creating your own dataset.
|
36 |
-
#
|
37 |
-
# </Tip>
|
38 |
-
#
|
39 |
-
# Start by installing π€ Datasets:
|
40 |
-
#
|
41 |
-
# ```bash
|
42 |
-
# pip install datasets
|
43 |
-
# ```
|
44 |
-
#
|
45 |
-
# π€ Datasets also support audio and image data formats:
|
46 |
-
#
|
47 |
-
# * To work with audio datasets, install the [Audio](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Audio) feature:
|
48 |
-
#
|
49 |
-
# ```bash
|
50 |
-
# pip install datasets[audio]
|
51 |
-
# ```
|
52 |
-
#
|
53 |
-
# * To work with image datasets, install the [Image](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Image) feature:
|
54 |
-
#
|
55 |
-
# ```bash
|
56 |
-
# pip install datasets[vision]
|
57 |
-
# ```
|
58 |
-
#
|
59 |
-
# Besides π€ Datasets, make sure your preferred machine learning framework is installed:
|
60 |
-
#
|
61 |
-
# ```bash
|
62 |
-
# pip install torch
|
63 |
-
# ```
|
64 |
-
# ```bash
|
65 |
-
# pip install tensorflow
|
66 |
-
# ```
|
67 |
|
68 |
-
|
69 |
-
# ## Audio
|
70 |
|
71 |
-
|
72 |
-
|
73 |
-
#
|
74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
# %%
|
77 |
from datasets import load_dataset, Audio
|
|
|
1 |
import streamlit as st
|
2 |
# %%
|
3 |
+
Datasets installation
|
4 |
+
! pip install datasets transformers
|
5 |
+
To install from source instead of the last release, comment the command above and uncomment the following one.
|
6 |
+
! pip install git+https://github.com/huggingface/datasets.git
|
7 |
|
8 |
+
%% [markdown]
|
9 |
+
# Quickstart
|
10 |
|
11 |
+
%% [markdown]
|
12 |
+
This quickstart is intended for developers who are ready to dive into the code and see an example of how to integrate π€ Datasets into their model training workflow. If you're a beginner, we recommend starting with our [tutorials](https://huggingface.co/docs/datasets/main/en/./tutorial), where you'll get a more thorough introduction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
+
Each dataset is unique, and depending on the task, some datasets may require additional steps to prepare it for training. But you can always use π€ Datasets tools to load and process a dataset. The fastest and easiest way to get started is by loading an existing dataset from the [Hugging Face Hub](https://huggingface.co/datasets). There are thousands of datasets to choose from, spanning many tasks. Choose the type of dataset you want to work with, and let's get started!
|
|
|
15 |
|
16 |
+
<div class="mt-4">
|
17 |
+
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">
|
18 |
+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#audio"
|
19 |
+
><div class="w-full text-center bg-gradient-to-r from-violet-300 via-sky-400 to-green-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Audio</div>
|
20 |
+
<p class="text-gray-700">Resample an audio dataset and get it ready for a model to classify what type of banking issue a speaker is calling about.</p>
|
21 |
+
</a>
|
22 |
+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#vision"
|
23 |
+
><div class="w-full text-center bg-gradient-to-r from-pink-400 via-purple-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Vision</div>
|
24 |
+
<p class="text-gray-700">Apply data augmentation to an image dataset and get it ready for a model to diagnose disease in bean plants.</p>
|
25 |
+
</a>
|
26 |
+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#nlp"
|
27 |
+
><div class="w-full text-center bg-gradient-to-r from-orange-300 via-red-400 to-violet-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">NLP</div>
|
28 |
+
<p class="text-gray-700">Tokenize a dataset and get it ready for a model to determine whether a pair of sentences have the same meaning.</p>
|
29 |
+
</a>
|
30 |
+
</div>
|
31 |
+
</div>
|
32 |
+
|
33 |
+
<Tip>
|
34 |
+
|
35 |
+
Check out [Chapter 5](https://huggingface.co/course/chapter5/1?fw=pt) of the Hugging Face course to learn more about other important topics such as loading remote or local datasets, tools for cleaning up a dataset, and creating your own dataset.
|
36 |
+
|
37 |
+
</Tip>
|
38 |
+
|
39 |
+
Start by installing π€ Datasets:
|
40 |
+
|
41 |
+
```bash
|
42 |
+
pip install datasets
|
43 |
+
```
|
44 |
+
|
45 |
+
π€ Datasets also support audio and image data formats:
|
46 |
+
|
47 |
+
* To work with audio datasets, install the [Audio](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Audio) feature:
|
48 |
+
|
49 |
+
```bash
|
50 |
+
pip install datasets[audio]
|
51 |
+
```
|
52 |
+
|
53 |
+
* To work with image datasets, install the [Image](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Image) feature:
|
54 |
+
|
55 |
+
```bash
|
56 |
+
pip install datasets[vision]
|
57 |
+
```
|
58 |
+
|
59 |
+
Besides π€ Datasets, make sure your preferred machine learning framework is installed:
|
60 |
+
|
61 |
+
```bash
|
62 |
+
pip install torch
|
63 |
+
```
|
64 |
+
```bash
|
65 |
+
pip install tensorflow
|
66 |
+
```
|
67 |
+
|
68 |
+
%% [markdown]
|
69 |
+
## Audio
|
70 |
+
|
71 |
+
%% [markdown]
|
72 |
+
Audio datasets are loaded just like text datasets. However, an audio dataset is preprocessed a bit differently. Instead of a tokenizer, you'll need a [feature extractor](https://huggingface.co/docs/transformers/main_classes/feature_extractor#feature-extractor). An audio input may also require resampling its sampling rate to match the sampling rate of the pretrained model you're using. In this quickstart, you'll prepare the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset for a model train on and classify the banking issue a customer is having.
|
73 |
+
|
74 |
+
**1**. Load the MInDS-14 dataset by providing the [load_dataset()](https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset) function with the dataset name, dataset configuration (not all datasets will have a configuration), and a dataset split:
|
75 |
|
76 |
# %%
|
77 |
from datasets import load_dataset, Audio
|