mxbai-colbert-large-v1
A state-of-the-art ColBERT model for reranking and retrieval tasks. This model combines efficient vector search with nuanced token-level matching, making it ideal for advanced information retrieval applications.
Model Description
mxbai-colbert-large-v1 is a state-of-the-art ColBERT (Contextualized Late Interaction BERT) model for reranking and retrieval tasks. It is based on the mxbai-embed-large-v1 model and achieves state-of-the-art performance on 13 publicly available BEIR benchmarks.
ColBERT combines the benefits of vector search and cross-encoders. Queries and documents are encoded separately, but instead of creating a single embedding for the entire document, ColBERT generates contextualized embeddings for each token in the document. During search, the token-level query embeddings are compared with the token-level embeddings of the documents using the lightweight scoring function MaxSim. This allows ColBERT to capture nuanced matching signals while being computationally efficient.
mxbai-colbert-large-v1 is initialized from the mxbai-embed-large-v1 model, which was trained on over 700 million samples from various domains. The ColBERT model was then fine-tuned on around 96 million samples to adapt it to the late interaction mechanism. This extensive training enables the model to be used for a wide range of tasks and domains.
On the BEIR benchmark, mxbai-colbert-large-v1 outperforms other ColBERT models on average and directly in most tasks. Its exceptionally high reranking score even surpasses typical scores for cross-encoder based reranker models on the benchmark, despite the advantages of the ColBERT architecture regarding resource efficiency. The model also demonstrates state-of-the-art retrieval performance when compared to other currently available ColBERT models.
Compare with other models
Model | Context Window | Dimensions | Input Price (/1M tokens) |
---|---|---|---|
mxbai colbert large v1 | 512 | 1024 | $0.00 |
mxbai Embed Large v1 | 512 | 1024 | $0.00 |
deepset mxbai embed german large v1 | 512 | 1024 | $0.00 |
mxbai embed 2d large v1 | 512 | 1024 | $0.00 |
mxbai embed xsmall v1 | 4.1K | 384 | $0.00 |
Examples
We recommend using RAGatouille for utilizing our ColBERT model.
The result looks like this:
Last updated: May 6, 2025