Spaces:
Runtime error
Runtime error
Updated README
Browse files
README.md
CHANGED
@@ -1,102 +1,13 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
```sh
|
15 |
-
# Clone repo
|
16 |
-
$ git clone https://github.com/fvialibre/edia.git && cd edia
|
17 |
-
# Create and activate virtualenv
|
18 |
-
$ python3 -m venv venv && source venv/bin/activate
|
19 |
-
# Install requirements
|
20 |
-
$ python3 -m pip install -r requirements.txt
|
21 |
-
```
|
22 |
-
## Setup data
|
23 |
-
|
24 |
-
In order to start using this tool, you need to create the requiered structure for it to retrieve the data. To do this, we provide you a script for doing it automatically, but also explainations on how to do it manually for more personal customization.
|
25 |
-
|
26 |
-
### Automatic setup
|
27 |
-
In the cloned repository you have the `setup.sh` script that you can run in Linux OS:
|
28 |
-
|
29 |
-
```sh
|
30 |
-
$ ./setup.sh
|
31 |
-
```
|
32 |
-
|
33 |
-
This will create a `data/` folder inside the repository and download from *Google Drive* two 100k embeddings files (for English and Spanish), and two vocabulary files (`Min` and `Full`, see [Manual setup](#Manual-setup)).
|
34 |
-
|
35 |
-
### Manual setup
|
36 |
-
To setup the structure manually just create a `data/` folder inside the `edia` repository just cloned:
|
37 |
-
|
38 |
-
```sh
|
39 |
-
$ mkdir data
|
40 |
-
```
|
41 |
-
|
42 |
-
And then download inside this newly created folder the files you will need:
|
43 |
-
|
44 |
-
* [Min vocabulary:](https://drive.google.com/file/d/1uI6HsBw1XWVvTEIs9goSpUVfeVJe-zEP/view?usp=sharing) Composed of only 56 words, for tests purpose only.
|
45 |
-
* [Full vocabulary:](https://drive.google.com/file/d/1T_pLFkUucP-NtPRCsO7RkOuhMqGi41pe/view?usp=sharing) Composed of 1.2M words.
|
46 |
-
* [Spanish word embeddings: ](https://drive.google.com/file/d/1YwjyiDN0w54P55-y3SKogk7Zcd-WQ-eQ/view?usp=sharing) 100K spanish word embeddings of 300 dimensions (from [Jorge Pérez's website](http://dcc.uchile.cl/~jperez))
|
47 |
-
* [Spanish word embeddings: ](https://drive.google.com/file/d/1EN0pp1RKyRwi072QhVWJaDO8KlcFZo46/view?usp=sharing) 100K english word embeddings of 300 dimensions (from [Eyal Gruss's github](https://github.com/eyaler/word2vec-slim))
|
48 |
-
|
49 |
-
> **Note**: You will need one of the two vocabulary files (`Min` or `Full`) if you don't want to be bothered to create the complex structure needed. The embeddings file, on the other side, can be one of your own, we just give this two as functional options.
|
50 |
-
|
51 |
-
## Usage
|
52 |
-
```sh
|
53 |
-
# If you are not already in the venv
|
54 |
-
$ source venv/bin/activate
|
55 |
-
$ python3 app.py
|
56 |
-
```
|
57 |
-
|
58 |
-
## Tool Configuration
|
59 |
-
|
60 |
-
The file `tool.cfg` contains configuration parameters for the tool:
|
61 |
-
|
62 |
-
| **Name** | **Options** | **Description** |
|
63 |
-
|---|---|---|
|
64 |
-
| language | `es`, `en` | Changes the interface language |
|
65 |
-
| embeddings_path | `data/100k_es_embedding.vec`, `data/100k_en_embedding.vec` | Path to word embeddings to use. You can use your own embedding file as long as it is in `.vec` format. If it's a `.bin` extended file, only gensims c binary format are valid. The options correspond to pretrained english and spanish embeddings. |
|
66 |
-
| nn_method | `sklearn`, `ann` | Method used to fetch nearest neighbors. Sklearn uses [sklearn nearest neighbors](https://scikit-learn.org/stable/modules/neighbors.html) exact calculation so your embedding must fit in your computer's memory, it's a slower approach for large embeddings. [Ann](https://pypi.org/project/annoy/1.0.3/) is a approximate nearest neighbors search suitable for large embeddings that don't fit in memory |
|
67 |
-
| max_neighbors | (int) `20` | Select amount of neighbors to fit sklearn nearest neighbors method. |
|
68 |
-
| context_dataset | `vialibre/splittedspanish3bwc` | Path to splitted 3bwc dataset optimised for word context search. |
|
69 |
-
| vocabulary_subset | `mini`, `full` | Vocabulary necessary for context search tool |
|
70 |
-
| available_wordcloud | `True`, `False` | Show wordcloud in "Data" interface |
|
71 |
-
| language_model | `bert-base-uncased`, `dccuchile/bert-base-spanish-wwm-uncased` | `bert-base-uncased` is an english language model, `bert-base-spanish-wwm-uncased` is an spanish model. You can inspect any bert-base language model uploaded to the [HuggingfaceHub](https://huggingface.co/models). |
|
72 |
-
| available_logs | `True`, `False` | Activate logging of user's input. Saved logs will be stores in `logs/` folder. |
|
73 |
-
|
74 |
-
## Resources
|
75 |
-
### Videotutorials and user's manual
|
76 |
-
* Word explorer: [[video]]() [manual: [es](https://shorturl.at/cgwxJ) | en]
|
77 |
-
* Word bias explorer: [[video]]() [manual: [es](https://shorturl.at/htuEI) | en]
|
78 |
-
* Phrase bias explorer: [[video]]() [manual: [es](https://shorturl.at/fkBL3) | en]
|
79 |
-
* Data explorer: [[video]]() [manual: [es](https://shorturl.at/CIVY6) | en]
|
80 |
-
* Crows-Pairs: [[video]]() [manual: [es](https://shorturl.at/gJLTU) | en]
|
81 |
-
### Interactive nooteboks
|
82 |
-
* How to use (*road map*): [[es](notebook/EDIA_Road_Map.ipynb) | en]
|
83 |
-
* Classes and methods docs: [[es](notebook/EDIA_Docs.ipynb) | en]
|
84 |
-
|
85 |
-
## Citation Information
|
86 |
-
```c
|
87 |
-
@misc{https://doi.org/10.48550/arxiv.2207.06591,
|
88 |
-
doi = {10.48550/ARXIV.2207.06591},
|
89 |
-
url = {https://arxiv.org/abs/2207.06591},
|
90 |
-
author = {Alemany, Laura Alonso and Benotti, Luciana and González, Lucía and Maina, Hernán and Busaniche, Beatriz and Halvorsen, Alexia and Bordone, Matías and Sánchez, Jorge},
|
91 |
-
keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI),
|
92 |
-
FOS: Computer and information sciences, FOS: Computer and information sciences},
|
93 |
-
title = {A tool to overcome technical barriers for bias assessment in human language technologies},
|
94 |
-
publisher = {arXiv},
|
95 |
-
year = {2022},
|
96 |
-
copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
|
97 |
-
}
|
98 |
-
```
|
99 |
-
|
100 |
-
## License Information
|
101 |
-
This project is under a [MIT license](LICENSE).
|
102 |
-
|
|
|
1 |
+
---
|
2 |
+
title: Edia Full En
|
3 |
+
emoji: 👁
|
4 |
+
colorFrom: purple
|
5 |
+
colorTo: gray
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 3.16.2
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
---
|
12 |
+
|
13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|