--- license: llama3.2 language: - vi base_model: - meta-llama/Llama-3.2-1B-Instruct pipeline_tag: sentence-similarity library_name: transformers --- # misa-ai/Llama-3.2-1B-Instruct-Embedding-Base This is a Embedding model for Document Retrieval: It maps sentences & paragraphs to a 2048 dimensional dense vector space and can be used for tasks like clustering or semantic search. We train the model on a merged training dataset that consists of multiple domains, about 900k triplets in Vietnamese: We use [Llama-3.2-1B-Instruct](https://github.com/meta-llama/Llama-3.2-1B-Instruct) as the pre-trained backbone. This model directed to Document Retrieval. Details: - Max support context size: 4096 tokens - Pooling last token (should use `padding_side = "left"`) - Language: Vietnamese - Prompts: - Query: "Cho một câu truy vấn tìm kiếm thông tin, hãy truy xuất các tài liệu có liên quan trả lời cho truy vấn đó." - Document: "" ### Please cite our manuscript if this dataset is used for your work - Organization: MISA JSC - Author: [Sy-The Ho](https://huggingface.co/thehosy)