Sentence Similarity
sentence-transformers
PyTorch
Transformers
English
t5
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
prompt-retrieval
text-reranking
feature-extraction
English
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
do html documents need tags removed?
#10
by
awokeknowing
- opened
my documents are descriptions in html, where there are paragraph tags and ul li lists, and some strong tags etc. Do I need to strip all that before embedding, or does it help to understand the meaning of the text?
Hi, Thanks a lot for your interest in the INSTRUCTOR model!
For the html descriptions, I would suggest removing the tags for better semantic understanding.
Feel free to add any further questions or comments!