misa-ai/Llama-3.2-1B-Instruct-Embedding-Base
This is a Embedding model for Document Retrieval: It maps sentences & paragraphs to a 2048 dimensional dense vector space and can be used for tasks like clustering or semantic search.
We train the model on a merged training dataset that consists of multiple domains, about 900k triplets in Vietnamese:
We use Llama-3.2-1B-Instruct as the pre-trained backbone.
This model directed to Document Retrieval.
Details:
- Max support context size: 4096 tokens
- Pooling last token (should use
padding_side = "left"
) - Language: Vietnamese
- Prompts:
- Query: "Cho một câu truy vấn tìm kiếm thông tin, hãy truy xuất các tài liệu có liên quan trả lời cho truy vấn đó."
- Document: ""
Please cite our manuscript if this dataset is used for your work
- Organization: MISA JSC
- Author: Sy-The Ho
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support sentence-similarity models for transformers library.
Model tree for misa-ai/Llama-3.2-1B-Instruct-Embedding-Base
Base model
meta-llama/Llama-3.2-1B-Instruct