voyage-code-3
This repo contains the tokenizer of the voyage-code-3
embedding model. voyage-code-3
is optimized for code retrieval, outperforming OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% on a suite of 238 code retrieval datasets, respectively. By supporting smaller dimensions with Matryoshka learning and quantized formats like int8 and binary, voyage-code-3 can also dramatically reduce storage and search costs with minimal impact on retrieval quality. Please refer to our blogpost for more details about this model.
You can use voyage-code-3
via the Voyage API, AWS SageMaker or on-prem deplotment.
Voyage API
The following code snippet shows the usage of the Voyage API. Install the Voyage AI Python SDK via:
pip install -U voyageai
Voyage AI utilizes API keys to monitor usage and manage permissions. To obtain your key, please sign in with your Voyage AI account and click the "Create new API key" button in the dashboard. We recommend setting the API key as an environment variable. For example, in MacOS or Linux, type the following command in the terminal, replacing with your actual API key:
import voyageai
vo = voyageai.Client()
# This will automatically use the environment variable VOYAGE_API_KEY.
# Alternatively, you can use vo = voyageai.Client(api_key="<your secret key>")
result = vo.embed(
texts=["hello world"],
model="voyage-code-3",
input_type="document",
output_dimension=2048,
output_dtype="float"
)
AWS SageMaker
The embedding model can be privately deployed in your AWS Cloud using our AWS SageMaker marketplace offering. It runs privately in your VPC.
On-prem Deployment
Want to run voyage-code-3
on your own hardware? Feel free to reach out to [email protected] to learn more.
Evaluation results
- main_score on MTEB AppsRetrieval (default)test set self-reported93.621
- map_at_1 on MTEB AppsRetrieval (default)test set self-reported87.968
- map_at_10 on MTEB AppsRetrieval (default)test set self-reported92.047
- map_at_100 on MTEB AppsRetrieval (default)test set self-reported92.115
- map_at_1000 on MTEB AppsRetrieval (default)test set self-reported92.115
- map_at_20 on MTEB AppsRetrieval (default)test set self-reported92.091
- map_at_3 on MTEB AppsRetrieval (default)test set self-reported91.532
- map_at_5 on MTEB AppsRetrieval (default)test set self-reported91.874
- mrr_at_1 on MTEB AppsRetrieval (default)test set self-reported87.995
- mrr_at_10 on MTEB AppsRetrieval (default)test set self-reported92.062