Hugging Face Question Answering Bot

This repository focuses on the development of a Hugging Face question answering bot that assists users in creating their own ML solutions and troubleshooting technical issues related to Hugging Face libraries. Our solution combines an efficient context retrieval mechanism powered by FAISS with Stanford's Alpaca 7B language model to provide accurate and contextually relevant guidance derived from the Hugging Face documentation. The bot is designed to operate entirely locally on a consumer device, ensuring both accessibility and privacy.

Purpose

The Hugging Face Question Answering Bot is designed to help users quickly find solutions to common problems and questions related to Hugging Face libraries. Whether you're just getting started with ML or you're an experienced developer looking for advanced guidance, the bot can help you get the information you need to succeed.

Example

Setting up the bot
- Running in a Docker
- Running in a Python
Development instructions
Datasets

Setting up the bot

First, you need to provide the necessary environmental variables and API keys in the .env file.

HUGGINGFACEHUB_API_TOKEN - API key for HF Hub
DISCORD_TOKEN - API key for the bot application
QUESTION_ANSWERING_MODEL_ID - an ID of a model to be queried from HF Hub (in case of inference through API)
EMBEDDING_MODEL_ID - an ID of embedding model, used to create and query index on the documents
INDEX_NAME - directory where the index files are present after creation
USE_DOCS_IN_CONTEXT - allow context extration from documents
ADD_SOURCES_TO_RESPONSE - show references to documents that were used as a context for a given query
USE_MESSEGES_IN_CONTEXT - allow to use chat history for conversational experience
NUM_LAST_MESSAGES - number of messages used for the previous feature
USE_NAMES_IN_CONTEXT - use names of users in context
ENABLE_COMMANDS - allow command, e.g. channel cleanup
DEBUG - provides additional logging

If you decide that you want to run everthing locally our current MVP recommends using Instructor large and Alpaca 7B with 4-bit quatization models. For this to properly work, you need to put the weights of the model in the /bot/question_answering/ and set the QUESTION_ANSWERING_MODEL_ID variable to the name of the file that you just put in the aforementioned folder. Now, you should be able to run your own, local instance of the bot.

Running in a Docker

docker build -t <container-name> .
docker run <container-name>
# or simply:
./run_docker.sh

Running in a Python

pip install -r requirements.txt
python3 -m bot

Development Instructions

We use Python 3.10

To install all necessary Python packages, run the following command:

pip install -r requirements.txt

We use the pipreqsnb to generate the requirements.txt file. To install pipreqsnb, run the following command:

pip install pipreqsnb

To generate the requirements.txt file, run the following command:

pipreqsnb --force .

To run unit tests, you can use the following command:

pytest -o "testpaths=tests" --noconftest

Dataset List

Below is a list of the datasets used during development:

Spaces:

MWilinski
/

bot

Paused