Mixedbread JSON Format
The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.
File Formats
| Format | Extension | MIME Type | Structure |
|---|---|---|---|
| JSON | .mxjson | application/vnd-mxbai.chunks-json | Array of chunk objects |
| JSON Lines | .mxjsonl | application/vnd-mxbai.chunks-jsonl | One chunk object per line |
Chunk Structure
Each chunk in an mxjson file follows the same structure as Store Chunks. The type field determines which properties are required:
text- Text contentimage_url- Image referenceaudio_url- Audio referencevideo_url- Video reference
Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.
Input Properties
When creating chunks for mxjson files, these properties control ingestion:
| Property | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Chunk type: text, image_url, audio_url, video_url |
mime_type | string | Yes | Content MIME type |
chunk_index | integer | No | Position in file. Auto-generated sequentially if omitted |
generated_metadata | object | No | Arbitrary key-value metadata preserved on the chunk |
Text Chunks
| Property | Type | Required |
|---|---|---|
text | string | Yes (1-65536 characters) |
offset | integer | No (default: 0) |
Image Chunks
| Property | Type | Required |
|---|---|---|
image_url.url | string | Yes (HTTP URL or data URI) |
ocr_text | string | No |
summary | string | No |
Audio Chunks
| Property | Type | Required |
|---|---|---|
audio_url.url | string | Yes (HTTP URL or data URI) |
sampling_rate | integer | Yes |
transcription | string | No |
summary | string | No |
Video Chunks
| Property | Type | Required |
|---|---|---|
video_url.url | string | Yes (HTTP URL or data URI) |
transcription | string | No |
summary | string | No |
Chunk Metadata
Each chunk can include generated_metadata with arbitrary key-value pairs. This metadata is preserved on the chunk and returned in search results.
File-level metadata (set during upload) applies to all chunks and participates in contextualization if enabled on the Store.
Complete Example
JSON Format (.mxjson)
JSON Lines Format (.mxjsonl)
Schema Endpoint
Retrieve the JSON Schema programmatically:
Use Cases
Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).
Pre-processed pipelines: When existing ETL pipelines produce chunked content.
Multimodal collections: When combining text, images, audio, and video from different sources.
Metadata preservation: When chunks carry structured metadata from source systems.
Migration: When importing pre-chunked data from other vector databases.
Validation Errors
| Error | Cause |
|---|---|
type is required | Missing type field |
text must be 1-65536 characters | Text empty or exceeds limit |
| Invalid URL format | Malformed URL or data URI |
| Unknown chunk type | Type not one of: text, image_url, audio_url, video_url |