bot / README.md
MWilinski's picture
deploy 1
ae4e1e8
|
raw
history blame
3.72 kB

Hugging Face Question Answering Bot

This repository focuses on the development of a Hugging Face question answering bot that assists users in creating their own ML solutions and troubleshooting technical issues related to Hugging Face libraries. Our solution combines an efficient context retrieval mechanism powered by FAISS with Stanford's Alpaca 7B language model to provide accurate and contextually relevant guidance derived from the Hugging Face documentation. The bot is designed to operate entirely locally on a consumer device, ensuring both accessibility and privacy.

Purpose

The Hugging Face Question Answering Bot is designed to help users quickly find solutions to common problems and questions related to Hugging Face libraries. Whether you're just getting started with ML or you're an experienced developer looking for advanced guidance, the bot can help you get the information you need to succeed.

Example

Example

Table of Contents

Setting up the bot

First, you need to provide the necessary environmental variables and API keys in the .env file.

  • HUGGINGFACEHUB_API_TOKEN - API key for HF Hub
  • DISCORD_TOKEN - API key for the bot application
  • QUESTION_ANSWERING_MODEL_ID - an ID of a model to be queried from HF Hub (in case of inference through API)
  • EMBEDDING_MODEL_ID - an ID of embedding model, used to create and query index on the documents
  • INDEX_NAME - directory where the index files are present after creation
  • USE_DOCS_IN_CONTEXT - allow context extration from documents
  • ADD_SOURCES_TO_RESPONSE - show references to documents that were used as a context for a given query
  • USE_MESSEGES_IN_CONTEXT - allow to use chat history for conversational experience
  • NUM_LAST_MESSAGES - number of messages used for the previous feature
  • USE_NAMES_IN_CONTEXT - use names of users in context
  • ENABLE_COMMANDS - allow command, e.g. channel cleanup
  • DEBUG - provides additional logging

If you decide that you want to run everthing locally our current MVP recommends using Instructor large and Alpaca 7B with 4-bit quatization models. For this to properly work, you need to put the weights of the model in the /bot/question_answering/ and set the QUESTION_ANSWERING_MODEL_ID variable to the name of the file that you just put in the aforementioned folder. Now, you should be able to run your own, local instance of the bot.

Running in a Docker

docker build -t <container-name> .
docker run <container-name>
# or simply:
./run_docker.sh

Running in a Python

pip install -r requirements.txt
python3 -m bot

Development Instructions

We use Python 3.10

To install all necessary Python packages, run the following command:

pip install -r requirements.txt

We use the pipreqsnb to generate the requirements.txt file. To install pipreqsnb, run the following command:

pip install pipreqsnb

To generate the requirements.txt file, run the following command:

pipreqsnb --force .

To run unit tests, you can use the following command:

pytest -o "testpaths=tests" --noconftest

Dataset List

Below is a list of the datasets used during development: