mxbai-embed-2d-large-v1

Parameters 340M

Context Window 512

Price / 1M tokens $0.10

Languages EN

Model Description

mxbai-embed-2d-large-v1 is the world's first 2D-Matryoshka embedding model. The 2D-Matryoshka model introduces a novel approach that enables you to reduce both the number of layers and the dimensions of embeddings within the model. This dual reduction strategy allows for a more compact model size while still delivering performance on par with that of leading models such as Nomic's embedding model. Specifically, reducing the model's layers by approximately 50% retains up to 85% of its original performance, even without additional training.

The model was pretrained using contrastive training on over 700 million pairs, covering a wide variety of topics across the internet. It was then fine-tuned with over 30 million high-quality triplets using novel loss functions. mxbai-embed-2d-large-v1 allows users to get multiple models out of one and use different embedding sizes, providing full control over the trade-offs between speed, storage consumption, and model performance.

On the Massive Text Embedding Benchmark (MTEB), mxbai-embed-2d-large-v1a performs at the level of current embedding models of different sizes. The model's performance remains competitive even when the embedding size is reduced by a factor of 16. Additionally, the model retains about 75% of its performance after cutting half of its layers, demonstrating the effectiveness of the 2D-Matryoshka approach.

API Reference

Model Reference

Blog Post

Compare with other models

Model	Context Window	Dimensions	Price / 1M tokens
mxbai-embed-2d-large-v1	512	1024	$0.10
mxbai-embed-large-v1	512	1024	$0.10
deepset-mxbai-embed-de-large-v1	512	1024	$0.10