Mixedbread JSON Format
This format is for specialized use cases where you need complete control over chunking. For most workflows, use the standard file upload — it handles chunking, metadata generation, and indexing automatically.
The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.
Schema CheckerLink to section
Validate your mxjson files before uploading. Drop a .mxjson or .mxjsonl file below to check it against the current schema. You can also download the JSON Schema.
Drop your file here or click to browse
.mxjson or .mxjsonl up to 10MB
File FormatsLink to section
| Format | Extension | MIME Type | Structure |
|---|---|---|---|
| JSON | .mxjson | application/vnd-mxbai.chunks-json | Array of chunk objects |
| JSON Lines | .mxjsonl | application/vnd-mxbai.chunks-jsonl | One chunk object per line |
Chunk StructureLink to section
Each chunk in an mxjson file follows the same structure as Store Chunks. The type field determines which properties are required:
text- Text contentimage_url- Image referenceaudio_url- Audio referencevideo_url- Video reference
Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.
Input PropertiesLink to section
When creating chunks for mxjson files, these properties control ingestion:
| Property | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Chunk type: text, image_url, audio_url, video_url |
mime_type | string | No | Content MIME type (defaults per chunk type) |
chunk_index | integer | No | Position in file. Auto-generated sequentially if omitted |
generated_metadata | object | No | Custom key-value metadata for the chunk — see Chunk Metadata |
Text ChunksLink to section
| Property | Type | Required |
|---|---|---|
text | string | Yes (1-65536 characters) |
offset | integer | No (default: 0) |
{
"type": "text",
"text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
"mime_type": "text/plain",
"generated_metadata": {
"category": "baking",
"difficulty": "intermediate"
}
}Image ChunksLink to section
| Property | Type | Required |
|---|---|---|
image_url.url | string | Yes (HTTP URL or data URI) |
{
"type": "image_url",
"image_url": {
"url": "https://bakery.example.com/images/crumb-structure.png"
},
"mime_type": "image/png",
"generated_metadata": {
"subject": "crumb structure",
"source": "bakery-handbook"
}
}Audio ChunksLink to section
| Property | Type | Required |
|---|---|---|
audio_url.url | string | Yes (HTTP URL or data URI) |
sampling_rate | integer | Yes |
{
"type": "audio_url",
"audio_url": {
"url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
},
"mime_type": "audio/mpeg",
"sampling_rate": 44100,
"generated_metadata": {
"topic": "kneading technique",
"instructor": "Chef Marie"
}
}Video ChunksLink to section
| Property | Type | Required |
|---|---|---|
video_url.url | string | Yes (HTTP URL or data URI) |
{
"type": "video_url",
"video_url": {
"url": "https://bakery.example.com/video/shaping-boule.mp4"
},
"mime_type": "video/mp4",
"generated_metadata": {
"topic": "boule shaping",
"skill_level": "advanced"
}
}Chunk MetadataLink to section
Use generated_metadata to attach custom key-value pairs to individual chunks. You don't need to provide the type field — it is automatically inferred from the chunk type. Any additional fields you include are preserved and available for metadata filtering via the generated_metadata.* prefix.
{
"type": "text",
"text": "Autolyse is a rest period after mixing flour and water.",
"generated_metadata": {
"category": "technique",
"difficulty": "beginner",
"source": "baking-handbook"
}
}File-level metadata (set during upload) applies to all chunks and participates in contextualization if enabled on the Store.
Complete ExampleLink to section
JSON Format (.mxjson)Link to section
[
{
"type": "text",
"text": "Baguette Shaping Guide",
"mime_type": "text/plain",
"chunk_index": 0,
"generated_metadata": {
"section": "title",
"topic": "baguette shaping"
}
},
{
"type": "text",
"text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
"mime_type": "text/plain",
"chunk_index": 1,
"generated_metadata": {
"section": "instructions",
"step": "pre-shaping"
}
},
{
"type": "image_url",
"image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
"mime_type": "image/jpeg",
"chunk_index": 2,
"generated_metadata": {
"section": "instructions",
"subject": "pre-shaping visual"
}
}
]JSON Lines Format (.mxjsonl)Link to section
{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0, "generated_metadata": {"section": "title", "topic": "baguette shaping"}}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1, "generated_metadata": {"section": "instructions", "step": "pre-shaping"}}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2, "generated_metadata": {"section": "instructions", "subject": "pre-shaping visual"}}Schema EndpointLink to section
Retrieve the JSON Schema programmatically:
curl https://api.mixedbread.com/v1/schemas/mxjsonUse CasesLink to section
Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).
Pre-processed pipelines: When existing ETL pipelines produce chunked content.
Multimodal collections: When combining text, images, audio, and video from different sources.
Metadata preservation: When chunks carry structured metadata from source systems.
Migration: When importing pre-chunked data from other vector databases.
Validation ErrorsLink to section
| Error | Cause |
|---|---|
type is required | Missing type field |
text must be 1-65536 characters | Text empty or exceeds limit |
| Invalid URL format | Malformed URL or data URI |
| Unknown chunk type | Type not one of: text, image_url, audio_url, video_url |