noumanjavaid commited on
Commit
ff8031c
Β·
verified Β·
1 Parent(s): 94e6ccb

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +68 -68
app.py CHANGED
@@ -1,77 +1,77 @@
1
  import streamlit as st
2
  # %%
3
- # Datasets installation
4
- #! pip install datasets transformers
5
- # To install from source instead of the last release, comment the command above and uncomment the following one.
6
- # ! pip install git+https://github.com/huggingface/datasets.git
7
 
8
- # %% [markdown]
9
- # # Quickstart
10
 
11
- # %% [markdown]
12
- # This quickstart is intended for developers who are ready to dive into the code and see an example of how to integrate πŸ€— Datasets into their model training workflow. If you're a beginner, we recommend starting with our [tutorials](https://huggingface.co/docs/datasets/main/en/./tutorial), where you'll get a more thorough introduction.
13
- #
14
- # Each dataset is unique, and depending on the task, some datasets may require additional steps to prepare it for training. But you can always use πŸ€— Datasets tools to load and process a dataset. The fastest and easiest way to get started is by loading an existing dataset from the [Hugging Face Hub](https://huggingface.co/datasets). There are thousands of datasets to choose from, spanning many tasks. Choose the type of dataset you want to work with, and let's get started!
15
- #
16
- # <div class="mt-4">
17
- # <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">
18
- # <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#audio"
19
- # ><div class="w-full text-center bg-gradient-to-r from-violet-300 via-sky-400 to-green-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Audio</div>
20
- # <p class="text-gray-700">Resample an audio dataset and get it ready for a model to classify what type of banking issue a speaker is calling about.</p>
21
- # </a>
22
- # <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#vision"
23
- # ><div class="w-full text-center bg-gradient-to-r from-pink-400 via-purple-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Vision</div>
24
- # <p class="text-gray-700">Apply data augmentation to an image dataset and get it ready for a model to diagnose disease in bean plants.</p>
25
- # </a>
26
- # <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#nlp"
27
- # ><div class="w-full text-center bg-gradient-to-r from-orange-300 via-red-400 to-violet-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">NLP</div>
28
- # <p class="text-gray-700">Tokenize a dataset and get it ready for a model to determine whether a pair of sentences have the same meaning.</p>
29
- # </a>
30
- # </div>
31
- # </div>
32
- #
33
- # <Tip>
34
- #
35
- # Check out [Chapter 5](https://huggingface.co/course/chapter5/1?fw=pt) of the Hugging Face course to learn more about other important topics such as loading remote or local datasets, tools for cleaning up a dataset, and creating your own dataset.
36
- #
37
- # </Tip>
38
- #
39
- # Start by installing πŸ€— Datasets:
40
- #
41
- # ```bash
42
- # pip install datasets
43
- # ```
44
- #
45
- # πŸ€— Datasets also support audio and image data formats:
46
- #
47
- # * To work with audio datasets, install the [Audio](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Audio) feature:
48
- #
49
- # ```bash
50
- # pip install datasets[audio]
51
- # ```
52
- #
53
- # * To work with image datasets, install the [Image](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Image) feature:
54
- #
55
- # ```bash
56
- # pip install datasets[vision]
57
- # ```
58
- #
59
- # Besides πŸ€— Datasets, make sure your preferred machine learning framework is installed:
60
- #
61
- # ```bash
62
- # pip install torch
63
- # ```
64
- # ```bash
65
- # pip install tensorflow
66
- # ```
67
 
68
- # %% [markdown]
69
- # ## Audio
70
 
71
- # %% [markdown]
72
- # Audio datasets are loaded just like text datasets. However, an audio dataset is preprocessed a bit differently. Instead of a tokenizer, you'll need a [feature extractor](https://huggingface.co/docs/transformers/main_classes/feature_extractor#feature-extractor). An audio input may also require resampling its sampling rate to match the sampling rate of the pretrained model you're using. In this quickstart, you'll prepare the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset for a model train on and classify the banking issue a customer is having.
73
- #
74
- # **1**. Load the MInDS-14 dataset by providing the [load_dataset()](https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset) function with the dataset name, dataset configuration (not all datasets will have a configuration), and a dataset split:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  # %%
77
  from datasets import load_dataset, Audio
 
1
  import streamlit as st
2
  # %%
3
+ Datasets installation
4
+ ! pip install datasets transformers
5
+ To install from source instead of the last release, comment the command above and uncomment the following one.
6
+ ! pip install git+https://github.com/huggingface/datasets.git
7
 
8
+ %% [markdown]
9
+ # Quickstart
10
 
11
+ %% [markdown]
12
+ This quickstart is intended for developers who are ready to dive into the code and see an example of how to integrate πŸ€— Datasets into their model training workflow. If you're a beginner, we recommend starting with our [tutorials](https://huggingface.co/docs/datasets/main/en/./tutorial), where you'll get a more thorough introduction.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ Each dataset is unique, and depending on the task, some datasets may require additional steps to prepare it for training. But you can always use πŸ€— Datasets tools to load and process a dataset. The fastest and easiest way to get started is by loading an existing dataset from the [Hugging Face Hub](https://huggingface.co/datasets). There are thousands of datasets to choose from, spanning many tasks. Choose the type of dataset you want to work with, and let's get started!
 
15
 
16
+ <div class="mt-4">
17
+ <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5">
18
+ <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#audio"
19
+ ><div class="w-full text-center bg-gradient-to-r from-violet-300 via-sky-400 to-green-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Audio</div>
20
+ <p class="text-gray-700">Resample an audio dataset and get it ready for a model to classify what type of banking issue a speaker is calling about.</p>
21
+ </a>
22
+ <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#vision"
23
+ ><div class="w-full text-center bg-gradient-to-r from-pink-400 via-purple-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Vision</div>
24
+ <p class="text-gray-700">Apply data augmentation to an image dataset and get it ready for a model to diagnose disease in bean plants.</p>
25
+ </a>
26
+ <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="#nlp"
27
+ ><div class="w-full text-center bg-gradient-to-r from-orange-300 via-red-400 to-violet-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">NLP</div>
28
+ <p class="text-gray-700">Tokenize a dataset and get it ready for a model to determine whether a pair of sentences have the same meaning.</p>
29
+ </a>
30
+ </div>
31
+ </div>
32
+
33
+ <Tip>
34
+
35
+ Check out [Chapter 5](https://huggingface.co/course/chapter5/1?fw=pt) of the Hugging Face course to learn more about other important topics such as loading remote or local datasets, tools for cleaning up a dataset, and creating your own dataset.
36
+
37
+ </Tip>
38
+
39
+ Start by installing πŸ€— Datasets:
40
+
41
+ ```bash
42
+ pip install datasets
43
+ ```
44
+
45
+ πŸ€— Datasets also support audio and image data formats:
46
+
47
+ * To work with audio datasets, install the [Audio](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Audio) feature:
48
+
49
+ ```bash
50
+ pip install datasets[audio]
51
+ ```
52
+
53
+ * To work with image datasets, install the [Image](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Image) feature:
54
+
55
+ ```bash
56
+ pip install datasets[vision]
57
+ ```
58
+
59
+ Besides πŸ€— Datasets, make sure your preferred machine learning framework is installed:
60
+
61
+ ```bash
62
+ pip install torch
63
+ ```
64
+ ```bash
65
+ pip install tensorflow
66
+ ```
67
+
68
+ %% [markdown]
69
+ ## Audio
70
+
71
+ %% [markdown]
72
+ Audio datasets are loaded just like text datasets. However, an audio dataset is preprocessed a bit differently. Instead of a tokenizer, you'll need a [feature extractor](https://huggingface.co/docs/transformers/main_classes/feature_extractor#feature-extractor). An audio input may also require resampling its sampling rate to match the sampling rate of the pretrained model you're using. In this quickstart, you'll prepare the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset for a model train on and classify the banking issue a customer is having.
73
+
74
+ **1**. Load the MInDS-14 dataset by providing the [load_dataset()](https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset) function with the dataset name, dataset configuration (not all datasets will have a configuration), and a dataset split:
75
 
76
  # %%
77
  from datasets import load_dataset, Audio