Mixedbread
Ingest

Supported File Types

Stores natively understand a wide range of file formats through multimodal AI. No extraction needed, the system directly understands text, images, and complex layouts for semantic search.

Documents

FormatExtensions
PDF.pdf
Word.doc, .docx, .dotx, .docm, .dotm
OpenDocument Text.odt
Rich Text Format.rtf
Text.txt
Markdown.md, .mdx

Presentations

FormatExtensions
PowerPoint.ppt, .pptx
PowerPoint Slideshow.ppsx
PowerPoint Add-in.ppam
PowerPoint Macro-Enabled.pptm, .potm, .ppsm
OpenDocument Presentation.odp

Code

LanguageExtensions
Python.py
JavaScript.js
TypeScript.ts
Java.java
C#.cs
C.c, .h
C++.cpp, .cxx, .cc, .hpp, .hxx
Go.go
HTML.html, .htm
Ruby.rb
Rust.rs

Images

FormatExtensions
JPEG.jpg, .jpeg
PNG.png
WebP.webp
AVIF.avif

Specialized Formats

FormatExtensionsNotes
Mixedbread JSON.mxjson
Mixedbread JSONL.mxjsonl

The allows direct ingestion of pre-chunked content. Use it when you have custom chunking logic, need to preserve specific chunk boundaries, or want to include pre-computed metadata like OCR text or transcriptions.

Audio

FormatExtensions
MP3.mp3
WAV.wav
OGG.ogg, .oga
M4A.m4a
WebM Audio.weba
AAC.aac
FLAC.flac

Video

FormatExtensions
MP4.mp4
WebM.webm
QuickTime.mov
AVI.avi
OGG Video.ogv
Last updated: February 26, 2026