voyage-code-3

This repo contains the tokenizer of the voyage-code-3 embedding model. voyage-code-3 is optimized for code retrieval, outperforming OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% on a suite of 238 code retrieval datasets, respectively. By supporting smaller dimensions with Matryoshka learning and quantized formats like int8 and binary, voyage-code-3 can also dramatically reduce storage and search costs with minimal impact on retrieval quality. Please refer to our blogpost for more details about this model.

You can use voyage-code-3 via the Voyage API, AWS SageMaker or on-prem deplotment.

Voyage API

The following code snippet shows the usage of the Voyage API. Install the Voyage AI Python SDK via:

pip install -U voyageai

Voyage AI utilizes API keys to monitor usage and manage permissions. To obtain your key, please sign in with your Voyage AI account and click the "Create new API key" button in the dashboard. We recommend setting the API key as an environment variable. For example, in MacOS or Linux, type the following command in the terminal, replacing with your actual API key:

import voyageai

vo = voyageai.Client()
# This will automatically use the environment variable VOYAGE_API_KEY.
# Alternatively, you can use vo = voyageai.Client(api_key="<your secret key>")

result = vo.embed(
    texts=["hello world"], 
    model="voyage-code-3", 
    input_type="document",
    output_dimension=2048,
    output_dtype="float"
)

AWS SageMaker

The embedding model can be privately deployed in your AWS Cloud using our AWS SageMaker marketplace offering. It runs privately in your VPC.

On-prem Deployment

Want to run voyage-code-3 on your own hardware? Feel free to reach out to [email protected] to learn more.

voyageai
/

voyage-code-3

voyage-code-3

Voyage API

AWS SageMaker

On-prem Deployment

Evaluation results