Mixedbread
Ingest

Mixedbread JSON Format

The Mixedbread JSON format (.mxjson / .mxjsonl) allows you to ingest pre-chunked content directly into Stores. Use this format when you have already processed your content into chunks, want to preserve specific chunk boundaries, or need to include pre-computed metadata.

Schema CheckerLink to section

Validate your mxjson files before uploading. Drop a .mxjson or .mxjsonl file below to check it against the current schema. You can also download the JSON Schema.

File FormatsLink to section

FormatExtensionMIME TypeStructure
JSON.mxjsonapplication/vnd-mxbai.chunks-jsonArray of chunk objects
JSON Lines.mxjsonlapplication/vnd-mxbai.chunks-jsonlOne chunk object per line

Chunk StructureLink to section

Each chunk in an mxjson file follows the same structure as . The type field determines which properties are required:

  • text - Text content
  • image_url - Image reference
  • audio_url - Audio reference
  • video_url - Video reference

Each chunk contains exactly one modality. To represent a document with text and images, use separate chunks for each.

Input PropertiesLink to section

When creating chunks for mxjson files, these properties control ingestion:

PropertyTypeRequiredDescription
typestringYesChunk type: text, image_url, audio_url, video_url
mime_typestringNoContent MIME type (defaults per chunk type)
chunk_indexintegerNoPosition in file. Auto-generated sequentially if omitted
generated_metadataobjectNoCustom key-value metadata for the chunk — see

Text ChunksLink to section

PropertyTypeRequired
textstringYes (1-65536 characters)
offsetintegerNo (default: 0)
{
  "type": "text",
  "text": "Sourdough fermentation relies on wild yeast and lactic acid bacteria.",
  "mime_type": "text/plain",
  "generated_metadata": {
    "category": "baking",
    "difficulty": "intermediate"
  }
}

Image ChunksLink to section

PropertyTypeRequired
image_url.urlstringYes (HTTP URL or data URI)
{
  "type": "image_url",
  "image_url": {
    "url": "https://bakery.example.com/images/crumb-structure.png"
  },
  "mime_type": "image/png",
  "generated_metadata": {
    "subject": "crumb structure",
    "source": "bakery-handbook"
  }
}

Audio ChunksLink to section

PropertyTypeRequired
audio_url.urlstringYes (HTTP URL or data URI)
sampling_rateintegerYes
{
  "type": "audio_url",
  "audio_url": {
    "url": "https://bakery.example.com/audio/kneading-tutorial.mp3"
  },
  "mime_type": "audio/mpeg",
  "sampling_rate": 44100,
  "generated_metadata": {
    "topic": "kneading technique",
    "instructor": "Chef Marie"
  }
}

Video ChunksLink to section

PropertyTypeRequired
video_url.urlstringYes (HTTP URL or data URI)
{
  "type": "video_url",
  "video_url": {
    "url": "https://bakery.example.com/video/shaping-boule.mp4"
  },
  "mime_type": "video/mp4",
  "generated_metadata": {
    "topic": "boule shaping",
    "skill_level": "advanced"
  }
}

Chunk MetadataLink to section

Use generated_metadata to attach custom key-value pairs to individual chunks. You don't need to provide the type field — it is automatically inferred from the chunk type. Any additional fields you include are preserved and available for via the generated_metadata.* prefix.

{
  "type": "text",
  "text": "Autolyse is a rest period after mixing flour and water.",
  "generated_metadata": {
    "category": "technique",
    "difficulty": "beginner",
    "source": "baking-handbook"
  }
}

File-level metadata (set during upload) applies to all chunks and participates in if enabled on the Store.

Complete ExampleLink to section

JSON Format (.mxjson)Link to section

[
  {
    "type": "text",
    "text": "Baguette Shaping Guide",
    "mime_type": "text/plain",
    "chunk_index": 0,
    "generated_metadata": {
      "section": "title",
      "topic": "baguette shaping"
    }
  },
  {
    "type": "text",
    "text": "Pre-shape the dough into a loose rectangle. Let it rest for 15-20 minutes to relax the gluten before final shaping.",
    "mime_type": "text/plain",
    "chunk_index": 1,
    "generated_metadata": {
      "section": "instructions",
      "step": "pre-shaping"
    }
  },
  {
    "type": "image_url",
    "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"},
    "mime_type": "image/jpeg",
    "chunk_index": 2,
    "generated_metadata": {
      "section": "instructions",
      "subject": "pre-shaping visual"
    }
  }
]

JSON Lines Format (.mxjsonl)Link to section

{"type": "text", "text": "Baguette Shaping Guide", "mime_type": "text/plain", "chunk_index": 0, "generated_metadata": {"section": "title", "topic": "baguette shaping"}}
{"type": "text", "text": "Pre-shape the dough into a loose rectangle.", "mime_type": "text/plain", "chunk_index": 1, "generated_metadata": {"section": "instructions", "step": "pre-shaping"}}
{"type": "image_url", "image_url": {"url": "https://bakery.example.com/images/baguette-preshape.jpg"}, "mime_type": "image/jpeg", "chunk_index": 2, "generated_metadata": {"section": "instructions", "subject": "pre-shaping visual"}}

Schema EndpointLink to section

Retrieve the JSON Schema programmatically:

curl https://api.mixedbread.com/v1/schemas/mxjson

Use CasesLink to section

Custom chunking: When your domain requires specific chunk boundaries (paragraphs, sections, recipe steps).

Pre-processed pipelines: When existing ETL pipelines produce chunked content.

Multimodal collections: When combining text, images, audio, and video from different sources.

Metadata preservation: When chunks carry structured metadata from source systems.

Migration: When importing pre-chunked data from other vector databases.

Validation ErrorsLink to section

ErrorCause
type is requiredMissing type field
text must be 1-65536 charactersText empty or exceeds limit
Invalid URL formatMalformed URL or data URI
Unknown chunk typeType not one of: text, image_url, audio_url, video_url
Last updated: April 13, 2026