mxbai-embed-large-v1
Model Description
mxbai-embed-large-v1 is our powerful English embedding model that provides state-of-the-art performance among efficiently sized models. It outperforms closed source models like OpenAI's text-embedding-ada-002.
The model was trained on a vast dataset of over 700 million pairs using contrastive training and fine-tuned on more than 30 million high-quality triplets using the AnglE loss function. This extensive training enables the model to adapt to a wide range of topics and domains, making it suitable for various real-world applications and [Retrieval-Augmented Generation (RAG) use cases.
mxbai-embed-large-v1 is well-suited for binary embeddings. This helps you save 32x storage and achieve 40x faster retrieval, while maintaining over 96% of the performance.
mxbai-embed-large-v1 achieves top performance on the Massive Text Embedding Benchmark (MTEB), which measures embedding models across seven tasks: classification, clustering, pair classification, re-ranking, retrieval, semantic textual similarity, and summarization. The model's strong performance across these diverse tasks demonstrates its versatility and robustness.
Compare with other models
Model | Context Window | Dimensions | Price / 1M tokens |
---|---|---|---|
mxbai-embed-large-v1 | 512 | 1024 | $0.10 |
deepset-mxbai-embed-de-large-v1 | 512 | 1024 | $0.10 |
mxbai-embed-2d-large-v1 | 512 | 1024 | $0.10 |
Our Embedding Models
Explore the delicious Mixedbread embed family, featuring state-of-the-art performance, size efficiency, and open-source availability. Elevate your search, classification, recommendation, and more.
mxbai-embed-2d-large-v1
Explore mxbai-embed-2d-large-v1, the world's first 2D-Matryoshka embedding model. Learn about its innovative approach to reducing model size while maintaining high performance, and discover how to leverage its flexible dimensionality for various NLP tasks and efficient information retrieval.