You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

misa-ai/Llama-3.2-1B-Instruct-Embedding-Base

This is a Embedding model for Document Retrieval: It maps sentences & paragraphs to a 2048 dimensional dense vector space and can be used for tasks like clustering or semantic search.

We train the model on a merged training dataset that consists of multiple domains, about 900k triplets in Vietnamese:

We use Llama-3.2-1B-Instruct as the pre-trained backbone.

This model directed to Document Retrieval.

Details:

  • Max support context size: 4096 tokens
  • Pooling last token (should use padding_side = "left")
  • Language: Vietnamese
  • Prompts:
    • Query: "Cho một câu truy vấn tìm kiếm thông tin, hãy truy xuất các tài liệu có liên quan trả lời cho truy vấn đó."
    • Document: ""

Please cite our manuscript if this dataset is used for your work

Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support sentence-similarity models for transformers library.

Model tree for misa-ai/Llama-3.2-1B-Instruct-Embedding-Base

Finetuned
(362)
this model