Overview
Utilize the Mixedbread Parsing API to transform complex documents (PDFs, DOCX, etc.) into clean, structured text elements or chunks. Improve data quality for RAG, embedding generation, and information extraction with our layout-aware parsing capabilities.
Introduction
The Mixedbread Parsing API is your essential tool for transforming complex documents into clean text. But it goes beyond simple text extraction. It understands the layout and returns detailed information about various layout elements. You receive the content, element information and even bounding boxes. All from a single API call.
Typical Workflow: Parsing a Document
Initiate a parsing job by providing your document.
Retrieve the parsed results from the job once it's complete.
Key Features
- Multi-Format Support: Handles various document types including PDF, PPTX, HTML, and more.
- Layout-Aware Extraction: Understands document structure beyond raw text.
- Structured Output: Provides detailed information about content elements.
- Multiple Output Formats: Choose from JSON, Markdown, or clean Text based on your needs.
- Asynchronous Processing: Efficiently handle large or complex documents.
- Improves Downstream Quality: Creates better input for embedding, RAG, and indexing.
Check out the Parsing API for detailed endpoints and code examples.
Last updated: May 2, 2025