mxbai-embed-large-v1
Discover mxbai-embed-large-v1, our state-of-the-art English embedding model. Learn about its powerful performance, versatility across various NLP tasks, and how to effectively use it for semantic search, information retrieval, and other applications.
Model Description
mxbai-embed-large-v1 is our powerful English embedding model that provides state-of-the-art performance among efficiently sized models. It outperforms closed source models like OpenAI's text-embedding-ada-002.
The model was trained on a vast dataset of over 700 million pairs using contrastive training and fine-tuned on more than 30 million high-quality triplets using the AnglE loss function. This extensive training enables the model to adapt to a wide range of topics and domains, making it suitable for various real-world applications and Retrieval-Augmented Generation (RAG) use cases.
mxbai-embed-large-v1 is well-suited for binary embeddings. This helps you save 32x storage and achieve 40x faster retrieval, while maintaining over 96% of the performance.
mxbai-embed-large-v1 achieves top performance on the Massive Text Embedding Benchmark (MTEB), which measures embedding models across seven tasks: classification, clustering, pair classification, re-ranking, retrieval, semantic textual similarity, and summarization. The model's strong performance across these diverse tasks demonstrates its versatility and robustness.
Compare with other models
Model | Context Window | Dimensions | Input Price (/1M tokens) |
---|---|---|---|
mxbai Embed Large v1 | 512 | 1024 | $0.00 |
deepset mxbai embed german large v1 | 512 | 1024 | $0.00 |
mxbai embed 2d large v1 | 512 | 1024 | $0.00 |
mxbai embed xsmall v1 | 4.1K | 384 | $0.00 |
mxbai colbert large v1 | 512 | 1024 | $0.00 |
Calculate Sentence Similarities
The following code illustrates how to compute similarities between sentences using the cosine similarity score function.
Semantic Search
The following code snippet demonstrates the retrieval of information related to a specific query from a given corpus. Note that the prompt Represent this sentence for searching relevant passages:
is used for the query.
Last updated: May 6, 2025