Mixedbread
Ingest

Mixedbread JSON Format

The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.

File Formats

FormatExtensionMIME TypeStructure
JSON.mxjsonapplication/vnd-mxbai.chunks-jsonArray of chunk objects
JSON Lines.mxjsonlapplication/vnd-mxbai.chunks-jsonlOne chunk object per line

Chunk Structure

Each chunk in an mxjson file follows the same structure as . The type field determines which properties are required:

  • text - Text content
  • image_url - Image reference
  • audio_url - Audio reference
  • video_url - Video reference

Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.

Input Properties

When creating chunks for mxjson files, these properties control ingestion:

PropertyTypeRequiredDescription
typestringYesChunk type: text, image_url, audio_url, video_url
mime_typestringYesContent MIME type
chunk_indexintegerNoPosition in file. Auto-generated sequentially if omitted
generated_metadataobjectNoArbitrary key-value metadata preserved on the chunk

Text Chunks

PropertyTypeRequired
textstringYes (1-65536 characters)
offsetintegerNo (default: 0)
{
  "type": "text",
  "text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
  "mime_type": "text/plain"
}

Image Chunks

PropertyTypeRequired
image_url.urlstringYes (HTTP URL or data URI)
ocr_textstringNo
summarystringNo
{
  "type": "image_url",
  "image_url": {
    "url": "https://bakery.example.com/images/crumb-structure.png"
  },
  "mime_type": "image/png",
  "ocr_text": "Figure 3: Open crumb structure",
  "summary": "Cross-section of sourdough loaf showing irregular hole distribution"
}

Audio Chunks

PropertyTypeRequired
audio_url.urlstringYes (HTTP URL or data URI)
sampling_rateintegerYes
transcriptionstringNo
summarystringNo
{
  "type": "audio_url",
  "audio_url": {
    "url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
  },
  "mime_type": "audio/mpeg",
  "sampling_rate": 44100,
  "transcription": "Fold the dough over itself, rotate ninety degrees, and repeat."
}

Video Chunks

PropertyTypeRequired
video_url.urlstringYes (HTTP URL or data URI)
transcriptionstringNo
summarystringNo
{
  "type": "video_url",
  "video_url": {
    "url": "https://bakery.example.com/video/shaping-boule.mp4"
  },
  "mime_type": "video/mp4",
  "transcription": "Pre-shape into a round, let it rest, then do the final shaping."
}

Chunk Metadata

Each chunk can include generated_metadata with arbitrary key-value pairs. This metadata is preserved on the chunk and returned in search results.

{
  "type": "text",
  "text": "Autolyse is a rest period after mixing flour and water.",
  "mime_type": "text/plain",
  "generated_metadata": {
    "technique": "autolyse",
    "difficulty": "beginner",
    "duration_minutes": 30
  }
}

File-level metadata (set during upload) applies to all chunks and participates in if enabled on the Store.

Complete Example

JSON Format (.mxjson)

[
  {
    "type": "text",
    "text": "Baguette Shaping Guide",
    "mime_type": "text/plain",
    "chunk_index": 0,
    "generated_metadata": {"section": "title"}
  },
  {
    "type": "text",
    "text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
    "mime_type": "text/plain",
    "chunk_index": 1,
    "generated_metadata": {"step": 1}
  },
  {
    "type": "image_url",
    "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
    "mime_type": "image/jpeg",
    "summary": "Dough pre-shaped into loose rectangle on floured surface",
    "chunk_index": 2,
    "generated_metadata": {"step": 1}
  }
]

JSON Lines Format (.mxjsonl)

{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2}

Schema Endpoint

Retrieve the JSON Schema programmatically:

curl https://api.mixedbread.com/v1/schemas/mxjson

Use Cases

Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).

Pre-processed pipelines: When existing ETL pipelines produce chunked content.

Multimodal collections: When combining text, images, audio, and video from different sources.

Metadata preservation: When chunks carry structured metadata from source systems.

Migration: When importing pre-chunked data from other vector databases.

Validation Errors

ErrorCause
type is requiredMissing type field
text must be 1-65536 charactersText empty or exceeds limit
Invalid URL formatMalformed URL or data URI
Unknown chunk typeType not one of: text, image_url, audio_url, video_url
Last updated: January 7, 2026