|
|
|
# Ask ANRG Project Description |
|
|
|
Our demo is available at [here](https://huggingface.co/spaces/FloraJ/Ask-ANRG). |
|
|
|
A concise and structured guide to setting up and understanding the ANRG project. |
|
|
|
--- |
|
|
|
## π Setup |
|
|
|
1. **Clone the Repository**: |
|
``` |
|
git clone [email protected]:ANRGUSC/ask-anrg.git |
|
``` |
|
|
|
2. **Navigate to the Directory**: |
|
``` |
|
cd ask-anrg/ |
|
``` |
|
|
|
3. **Create a Conda Environment**: |
|
``` |
|
conda create --name ask_anrg |
|
``` |
|
|
|
4. **Activate the Conda Environment**: |
|
``` |
|
conda activate ask_anrg |
|
``` |
|
|
|
5. **Install Required Dependencies**: |
|
``` |
|
pip3 install -r requirements.txt |
|
``` |
|
|
|
6. **Download database from [here](https://drive.google.com/file/d/1-TV70IFIzjO4uPzNRzef3FLhssAfK2g3/view?usp=sharing) for demo purpose, unzip it, and put it directly under the root directory, or place your own documents under the [original_documents](database/original_documents)** |
|
``` |
|
ask-anrg/ |
|
|-- database/ |
|
|-- original_documents/ |
|
|-- openai_function_utils/ |
|
|-- openai_function_impl.py |
|
|-- openai_function_interface.py |
|
|-- configs.py |
|
|-- requirements.txt |
|
|-- utils.py |
|
|-- main.py |
|
|-- Readme.md |
|
|-- project_description.md |
|
|-- result_report.txt |
|
|-- .gitignore |
|
``` |
|
7. **set up database data** |
|
If you place your own documents inside the [original_documents](database/original_documents) directory, please run the following command to prepare embeddings for your documents. |
|
``` |
|
python3 utils.py |
|
``` |
|
It will create `/database/embeddings` to store the embeddings of the original documents, and create a csv file ```database/document_name_to_embedding.csv``` that stores document name and its embedding vector. |
|
|
|
## π₯οΈ How to Run |
|
``` |
|
python main.py |
|
``` |
|
After the prompt "Hi! What question do you have for ANGR? Press 0 to exit", you can reply with your question. |
|
|
|
## π Structure |
|
* database: Contains scraped and processed data related to the lab. |
|
* embeddings: Processed embeddings for the publications. |
|
* original_documents: Original texts scraped from the lab website. |
|
* document_name_to_embedding.csv: Embeddings for all publications. |
|
* openai_function_utils: Utility functions related to OpenAI. |
|
* openai_function_impl.py: Implementations of the OpenAI functions. |
|
* openai_function_interface.py: Interfaces (descriptions) for the OpenAI functions. |
|
* configs.py: Configuration settings, e.g., OpenAI API key. |
|
* requirements.txt: Required Python libraries for the project. |
|
* utils.py: Utility functions, such as embedding, searching, and retrieving answers from ChatGPT. |
|
* main.py: Main entry point of the project. |
|
|
|
## π οΈ Implemented Functions for OPENAI |
|
These functions are selected to be used by ChatGPT during handling user questions: |
|
|
|
- `get_lab_member_info`: Retrieve details (name, photo URL, links, description) of a lab member by name. |
|
- `get_lab_member_detailed_info`: Detailed information(link, photo, description) of a lab member. |
|
- `get_publication_by_year`: List all publication information for a given year. |
|
- `get_pub_info`: Access details (title, venue, authors, year, link) of a publication by its title. |
|
- `get_pub_by_name`: Get information on all publications written by a specific lab member. |
|
|
|
More details on the functions can be checked under `openai_function_utils/`. |
|
|
|
## Evaluation: Turing Test |
|
We follow the steps below to evaluate our chatbot: |
|
1. Based on the information scraped from lab's website, we come up with questions that chatbot's users may ask, including both general (applied to any lab) and lab-specific questions. Here are some examples: |
|
- Who works here? |
|
- List all publications of this lab. |
|
- What are some recent publications by this lab in the area of [x]? |
|
- What conferences does this lab usually publish to? |
|
- What kind of undergraduate projects does this lab work on? |
|
- Give me the link to [x]'s homepage. |
|
- Give me a publication written by [x]. |
|
- How long has [x] been doing research in [y] area? |
|
- Who in the lab is currently working on [x]? |
|
- Where does former member [x] work now? |
|
2. Given 4 team members A, B, C, D. We will have A and B manually write down and provide answers to the evaluation questions for questions from each category. |
|
3. Then, C will test the questions on the ChatBot and collect the answers. |
|
4. Without knowing which answers are provided by human/chatbot, D will compare the answers for every question and choose which one is more preferable by human. |
|
5. Chatbot's winning rate (i.e. how many times the Chatbot manages to win over the human answerer) will be calculated. |
|
|
|
| Overall Winning Rate | |
|
|:-----------------------------: | |
|
| N/A | |
|
|
|
Refer to [ask_anrg_eval_question.csv](ask_anrg_eval_question.csv) for more details regarding the questions used for evaluation & evaluation results. |