mxbai-colbert-large-v1

Parameters 335M

Context Window 512

Price / 1M tokens -

Languages EN

Model Description

mxbai-colbert-large-v1 is a state-of-the-art ColBERT (Contextualized Late Interaction BERT) model for reranking and retrieval tasks. It is based on the mxbai-embed-large-v1 model and achieves state-of-the-art performance on 13 publicly available BEIR benchmarks.

ColBERT combines the benefits of vector search and cross-encoders. Queries and documents are encoded separately, but instead of creating a single embedding for the entire document, ColBERT generates contextualized embeddings for each token in the document. During search, the token-level query embeddings are compared with the token-level embeddings of the documents using the lightweight scoring function MaxSim. This allows ColBERT to capture nuanced matching signals while being computationally efficient.

mxbai-colbert-large-v1 is initialized from the mxbai-embed-large-v1 model, which was trained on over 700 million samples from various domains. The ColBERT model was then fine-tuned on around 96 million samples to adapt it to the late interaction mechanism. This extensive training enables the model to be used for a wide range of tasks and domains.

On the BEIR benchmark, mxbai-colbert-large-v1 outperforms other ColBERT models on average and directly in most tasks. Its exceptionally high reranking score even surpasses typical scores for cross-encoder based reranker models on the benchmark, despite the advantages of the ColBERT architecture regarding resource efficiency. The model also demonstrates state-of-the-art retrieval performance when compared to other currently available ColBERT models.

Model Reference

Blog Post

Compare with other models

Model	Context Window	Dimensions	Price / 1M tokens
mxbai colbert large v1	512	1024	-
mxbai Embed Large v1	512	1024	$0.10
deepset mxbai embed german large v1	512	1024	$0.10
mxbai embed 2d large v1	512	1024	$0.10
mxbai embed xsmall v1	4.1K	384	-

Examples

We recommend using RAGatouille for utilizing our ColBERT model.

pip install ragatouille

from ragatouille import RAGPretrainedModel

# Create a RAGatouille instance
RAG = RAGPretrainedModel.from_pretrained("mixedbread-ai/mxbai-colbert-v1")

documents = [
    "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

# Index documents
RAG.index(documents, index_name="mockingbird")

# Search
query = "Who wrote 'To Kill a Mockingbird'?"
results = RAG.search(query)

The result looks like this:

[
  {
    'content': "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    'score': 28.453125,
    'rank': 1,
    'document_id': '9d564e82-f14f-433a-ab40-b10bda9dc370',
    'passage_id': 0
  },
 {
    'content': "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 11926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
'score': 27.03125,
'rank': 2,
'document_id': 'a35a89c3-b610-4e2e-863e-fa1e7e0710a6',
'passage_id': 2
},
...
]

deepset-mxbai-embed-de-large-v1

Discover deepset-mxbai-embed-de-large-v1, a powerful German/English embedding model developed through collaboration between deepset and Mixedbread. This state-of-the-art open-source model offers superior performance, supports binary quantization and Matryoshka representation learning, and enables significant cost reductions in real-world applications.

mxbai-embed-xsmall-v1

Explore mxbai-embed-xsmall-v1, Mixedbread AI's smallest and most efficient English embedding model optimized for retrieval. Discover its competitive performance, long context support and capabilities in resource-constrained applications.

Last updated: June 25, 2025

mxbai-colbert-large-v1

Model Description

Compare with other models

Examples

On this page