rajivratn commited on
Commit
eb99bee
·
1 Parent(s): d5e9925

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gupshup
2
+ GupShup: Summarizing Open-Domain Code-Switched Conversations EMNLP 2021
3
+ Paper: [https://aclanthology.org/2021.emnlp-main.499.pdf](https://aclanthology.org/2021.emnlp-main.499.pdf)
4
+
5
+ ### Dataset
6
+ Please request for the Gupshup data using [this Google form](https://docs.google.com/forms/d/1zvUk7WcldVF3RCoHdWzQPzPprtSJClrnHoIOYbzaJEI/edit?ts=61381ec0).
7
+
8
+ Dataset is available for `Hinglish Dilaogues to English Summarization`(h2e) and `English Dialogues to English Summarization`(e2e). For each task, Dialogues/conversastion have `.source`(train.source) as file extension whereas Summary has `.target`(train.target) file extension. ".source" file need to be provided to `input_path` and ".target" file to `reference_path` argument in the scripts.
9
+
10
+
11
+ ## Models
12
+ All model weights are available on the Huggingface model hub. Users can either directly download these weights in their local and provide this path to `model_name` argument in the scripts or use the provided alias (to `model_name` argument) in scripts directly; this will lead to download weights automatically by scripts.
13
+
14
+ Model names were aliased in "gupshup_TASK_MODEL" sense, where "TASK" can be h2e,e2e and MODEL can be mbart, pegasus, etc., as listed below.
15
+
16
+ **1. Hinglish Dialogues to English Summary (h2e)**
17
+
18
+ | Model | Huggingface Alias |
19
+ |---------|-------------------------------------------------------------------------------|
20
+ | mBART | [midas/gupshup_h2e_mbart](https://huggingface.co/midas/gupshup_h2e_mbart) |
21
+ | PEGASUS | [midas/gupshup_h2e_pegasus](https://huggingface.co/midas/gupshup_h2e_pegasus) |
22
+ | T5 MTL | [midas/gupshup_h2e_t5_mtl](https://huggingface.co/midas/gupshup_h2e_t5_mtl) |
23
+ | T5 | [midas/gupshup_h2e_t5](https://huggingface.co/midas/gupshup_h2e_t5) |
24
+ | BART | [midas/gupshup_h2e_bart](https://huggingface.co/midas/gupshup_h2e_bart) |
25
+ | GPT-2 | [midas/gupshup_h2e_gpt](https://huggingface.co/midas/gupshup_h2e_gpt) |
26
+
27
+
28
+ **2. English Dialogues to English Summary (e2e)**
29
+
30
+ | Model | Huggingface Alias |
31
+ |---------|-------------------------------------------------------------------------------|
32
+ | mBART | [midas/gupshup_e2e_mbart](https://huggingface.co/midas/gupshup_e2e_mbart) |
33
+ | PEGASUS | [midas/gupshup_e2e_pegasus](https://huggingface.co/midas/gupshup_e2e_pegasus) |
34
+ | T5 MTL | [midas/gupshup_e2e_t5_mtl](https://huggingface.co/midas/gupshup_e2e_t5_mtl) |
35
+ | T5 | [midas/gupshup_e2e_t5](https://huggingface.co/midas/gupshup_e2e_t5) |
36
+ | BART | [midas/gupshup_e2e_bart](https://huggingface.co/midas/gupshup_e2e_bart) |
37
+ | GPT-2 | [midas/gupshup_e2e_gpt](https://huggingface.co/midas/gupshup_e2e_gpt) |
38
+
39
+ ## Inference
40
+
41
+ ### Using command line
42
+ 1. Clone this repo and create a python virtual environment (https://docs.python.org/3/library/venv.html). Install the required packages using
43
+ ```
44
+ git clone https://github.com/midas-research/gupshup.git
45
+ pip install -r requirements.txt
46
+ ```
47
+
48
+ 2. run_eval script has the following arguments.
49
+ * **model_name** : Path or alias to one of our models available on Huggingface as listed above.
50
+ * **input_path** : Source file or path to file containing conversations, which will be summarized.
51
+ * **save_path** : File path where to save summaries generated by the model.
52
+ * **reference_path** : Target file or path to file containing summaries, used to calculate matrices.
53
+ * **score_path** : File path where to save scores.
54
+ * **bs** : Batch size
55
+ * **device**: Cuda devices to use.
56
+
57
+ Please make sure you have downloaded the Gupshup dataset using the above google form and provide the correct path to these files in the argument's `input_path` and `refrence_path.` Or you can simply put `test.source` and `test.target` in `data/h2e/`(hinglish to english) or `data/e2e/`(english to english) folder. For example, to generate English summaries from Hinglish dialogues using the mbart model, run the following command
58
+
59
+ ```
60
+ python run_eval.py \
61
+ --model_name midas/gupshup_h2e_mbart \
62
+ --input_path data/h2e/test.source \
63
+ --save_path generated_summary.txt \
64
+ --reference_path data/h2e/test.target \
65
+ --score_path scores.txt \
66
+ --bs 8
67
+
68
+ ```
69
+
70
+ Another example, to generate English summaries from English dialogues using the Pegasus model
71
+ ```
72
+ python run_eval.py \
73
+ --model_name midas/gupshup_e2e_pegasus \
74
+ --input_path data/e2e/test.source \
75
+ --save_path generated_summary.txt \
76
+ --reference_path data/e2e/test.target \
77
+ --score_path scores.txt \
78
+ --bs 8
79
+
80
+ ```
81
+
82
+ ### In Google collaboratory
83
+ Please create a copy of this [Notebook on Google colab](https://colab.research.google.com/drive/16PI8Fqivzr8ScgQrs05y_kL6Qzqi7BBe#scrollTo=jNjGTzPb5eV_) or upload `gupshup_notebook.ipynb` on google collab and follow the instructions in it.
84
+
85
+
86
+ ### Streamlit UI
87
+
88
+ 1. Clone this repo and Create a python virtual environment (https://docs.python.org/3/library/venv.html). Install the required packages using
89
+ ```
90
+ git clone https://github.com/midas-research/gupshup.git
91
+ pip install -r requirements.txt
92
+ ```
93
+
94
+ 2. use Streamlit UI to make inferences from the choice of your models and tasks. To start the Streamlit Server:
95
+ ```
96
+ streamlit run app.py
97
+ ```
98
+ ![Image of Streamlit App](https://github.com/midas-research/gupshup/blob/main/images/emnlp-conversation.png)
99
+
100
+ Please create an issue if you are facing any difficulties in replicating the results.
101
+
102
+ ### References
103
+
104
+ Please cite [[1]](https://arxiv.org/abs/1910.04073) if you found the resources in this repository useful.
105
+
106
+
107
+ [1] Mehnaz, Laiba, Debanjan Mahata, Rakesh Gosangi, Uma Sushmitha Gunturi, Riya Jain, Gauri Gupta, Amardeep Kumar, Isabelle G. Lee, Anish Acharya, and Rajiv Shah. [*GupShup: Summarizing Open-Domain Code-Switched Conversations*](https://aclanthology.org/2021.emnlp-main.499.pdf)
108
+
109
+
110
+ ```
111
+ @inproceedings{mehnaz2021gupshup,
112
+ title={GupShup: Summarizing Open-Domain Code-Switched Conversations},
113
+ author={Mehnaz, Laiba and Mahata, Debanjan and Gosangi, Rakesh and Gunturi, Uma Sushmitha and Jain, Riya and Gupta, Gauri and Kumar, Amardeep and Lee, Isabelle G and Acharya, Anish and Shah, Rajiv},
114
+ booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
115
+ pages={6177--6192},
116
+ year={2021}
117
+ }
118
+
119
+ ```