dpr-ctx_encoder-bert-base-multilingual
Description
Multilingual DPR Model base on bert-base-multilingual-cased. DPR model DPR repo
Data
question pairs for train
: 644,217question pairs for dev
: 73,710
*DRCD and MLQA are converted using script from haystack squad_to_dpr.py
Training Script
I use the script from haystack
Usage
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('voidful/dpr-question_encoder-bert-base-multilingual')
model = DPRQuestionEncoder.from_pretrained('voidful/dpr-question_encoder-bert-base-multilingual')
input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"]
embeddings = model(input_ids).pooler_output
Follow the tutorial from haystack
:
Better Retrievers via "Dense Passage Retrieval"
from haystack.retriever.dense import DensePassageRetriever
retriever = DensePassageRetriever(document_store=document_store,
query_embedding_model="voidful/dpr-question_encoder-bert-base-multilingual",
passage_embedding_model="voidful/dpr-ctx_encoder-bert-base-multilingual",
max_seq_len_query=64,
max_seq_len_passage=256,
batch_size=16,
use_gpu=True,
embed_title=True,
use_fast_tokenizers=True)
- Downloads last month
- 209
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.