File size: 4,315 Bytes
dbfc66c 25136c9 dbfc66c a318b04 dbfc66c 9889a50 350cfac 325e3c6 790404c 325e3c6 9889a50 2a1de95 9889a50 2a1de95 6e5b7a2 9889a50 6e5b7a2 2a1de95 6e5b7a2 2a1de95 9889a50 6e5b7a2 350cfac 9889a50 8bbe3aa 9889a50 8bbe3aa 6e5b7a2 790404c 6e5b7a2 790404c 9889a50 8bbe3aa 9889a50 8bbe3aa 9889a50 83870cc 9889a50 83870cc 9889a50 83870cc 9889a50 83870cc 9889a50 be1f224 9889a50 be1f224 9889a50 be1f224 9889a50 dbfc66c 6e5b7a2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
title: Speech_Language_Processing_Jurafsky_Martin
emoji: 📚
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 2.9.0
python_version: 3.10.4
app_file: app.py
pinned: true
---
# NLP FlashCards
## DEMO
View the demo at huggingface spaces:
[DEMO](https://huggingface.co/spaces/RugNlpFlashcards/Speech_Language_Processing_Jurafsky_Martin)
## Dependencies
Make sure you have the following tools installed:
- [Python](https://www.python.org/downloads/) ^3.10,<3.11
- [Poetry](https://python-poetry.org/) for Python package management;
- [Docker](https://www.docker.com/get-started/) for running ElasticSearch.
- [Git LFS](https://git-lfs.github.com/) for downloading binary files that do not fit in git.
Then, run the following commands to install dependencies and Elasticsearch:
```sh
poetry install
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.1.1
docker network create elastic
docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.1.1
```
After the last command, a password for the `elastic` user should show up in the
terminal output (you might have to scroll up a bit). Copy this password, and
create a copy of the `.env.example` file and rename it to `.env`. Replace the
`<password>` placeholder with your copied password. The .env file can be used to change configuration of the system, leave it as is for a replication study.
Next, run the following command **from the root of the repository**:
```sh
docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
```
**NOTE 1:** If docker is not available or feasable. It is possible to use a trail hosted version of Elasticsearch at: https://www.elastic.co/cloud/
**NOTE 2** Installing dependencies without poetry is possible, but it is not our recommendation. To do so execute `pip install -r requirements.txt`
## Running
To make sure we're using the dependencies managed by Poetry, run `poetry shell`
before executing any of the following commands. Alternatively, replace any call
like `python file.py` with `poetry run python file.py` (but we suggest the shell
option, since it is much more convenient).
### Using the QA system
⚠️ **Important** ⚠️ _If you want to run an ElasticSearch query, make sure the
docker container is running! You can check this by running `docker container
ls`. If your container shows up (it's named `es01` if you followed these
instructions), it's running. If not, you can run `docker start es01` to start
it, or start it from Docker Desktop._
To query the QA system, run any query as follows:
```sh
python query.py "Why can dot product be used as a similarity metric?"
```
By default, the best answer along with its location in the book will be
returned. If you want to generate more answers (say, a top-5), you can supply
the `--top=5` option. The default retriever uses [FAISS](https://faiss.ai/), but
you can also use [ElasticSearch](https://www.elastic.co/elastic-stack/) using
the `--retriever=es` option. You can also pick a language model using the
`--lm` option, which accepts either `dpr` (Dense Passage Retrieval) or
`longformer`. The language model is used to generate embeddings for FAISS, and
is used to generate the answer.
### CLI overview
To get an overview of all available options, run `python query.py --help`. The
options are also printed below.
```sh
usage: query.py [-h] [--top int] [--retriever {faiss,es}] [--lm {dpr,longformer}] str
positional arguments:
str The question to feed to the QA system
options:
-h, --help show this help message and exit
--top int, -t int The number of answers to retrieve
--retriever {faiss,es}, -r {faiss,es}
The retrieval method to use
--lm {dpr,longformer}, -l {dpr,longformer}
The language model to use for the FAISS retriever
```
### Replicating the experiment
To fully run experiments, you need to run the following command:
```
# in the root of the project and poetry environment activated
python main.py
```
This command run all questions trough the system and stores the output to the `results/` directory.
After performing the experiment, results can be analyzed and displayed by running `plot.py` and the `results/*_analysis.ipynb` files.
|