Manage Files

Once files are uploaded to your Store, you can inspect and manage them using these core operations. All operations work with either the file ID or the file's external ID (if provided during upload).

File Identifiers: You can reference files using either their UUID (file_id) or their external ID. External IDs support slashes, making it easy to use file paths as identifiers (e.g., docs/api/authentication.md).

Retrieve Store FileLink to section

Get detailed information about a specific file using either its ID or external ID:

Retrieve Store File Metadata

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

file = mxbai.stores.files.retrieve(
    store_identifier="my-knowledge-base",
    file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
)

print(file)

By default, retrieve returns the file object and its file-level metadata. The chunks field is null unless you explicitly request chunks with return_chunks.

The response includes processing status, metadata, usage statistics, and error details if applicable.

For complete details on file object properties, see Data Models.

Retrieve File ChunksLink to section

Use return_chunks when you want the parsed, searchable representation of a file instead of only the file-level metadata.

Retrieve All Chunks from a File

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

file = mxbai.stores.files.retrieve(
    store_identifier="my-knowledge-base",
    file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
    return_chunks=True,
)

for chunk in file.chunks or []:
    print(chunk.chunk_index, chunk.type)

Use this when you want to inspect parsed text, OCR output, transcriptions, or chunk-level generated_metadata.

File Metadata vs. Chunk Metadata: metadata is your file-level metadata and is shared across all chunks from that file. generated_metadata is produced during parsing and can differ for each chunk.

For complete details on chunk fields and chunk types, see Data Models.

Retrieve Specific Chunks by IndexLink to section

return_chunks also accepts a list of chunk indices. This is useful when you want a small, exact slice of a file instead of every chunk.

Retrieve Specific Chunks by Index

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

file = mxbai.stores.files.retrieve(
    store_identifier="my-knowledge-base",
    file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
    return_chunks=[0, 3, 7],
)

for chunk in file.chunks or []:
    print(chunk.chunk_index)

Chunk indices are zero-based and correspond to the file's parsed chunk order.

Retrieve the Exact Chunk Returned by SearchLink to section

Search results include both file_id and chunk_index, so you can use them to load the exact source chunk that matched a query.

Hydrate a Search Result Back to Its Source Chunk

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

results = mxbai.stores.search(
    query="authentication timeout",
    store_identifiers=["my-knowledge-base"],
    top_k=1,
)

match = results.data[0]
file = mxbai.stores.files.retrieve(
    store_identifier="my-knowledge-base",
    file_identifier=match.file_id,
    return_chunks=[match.chunk_index],
)

chunk = file.chunks[0]
print(chunk)

This is the easiest way to go from a semantic search hit back to the precise chunk in the original file.

File Status and AvailabilityLink to section

To reliably inspect chunks, wait until the file reaches completed status.

pending: The file was accepted and queued for processing
in_progress: Parsing, chunking, embedding, and indexing are still running
completed: Chunks are ready to inspect and search
failed: Processing failed; inspect last_error for details
cancelled: Processing stopped before completion

If you need a file to be ready before continuing, use uploadAndPoll / upload_and_poll during ingestion or poll retrieve until the status becomes completed.

List Store FilesLink to section

View all files in your Store. The list operation uses cursor-based pagination:

List Store Files

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

response = mxbai.stores.files.list("my-knowledge-base", limit=20)

for file in response.data:
    print(file)

Pagination Details: For complete information about cursor-based pagination including parameters, response format, and advanced usage patterns, see the Pagination Reference.

Metadata FilteringLink to section

Filter Store Files based on their metadata.

List Files with Metadata Filter

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

response = mxbai.stores.files.list(
    store_identifier="my-knowledge-base",
    limit=10,
    metadata_filter={
        "key": "category",
        "value": "documentation",
        "operator": "eq",
    },
)

for file in response.data:
    print(file)

Complete Filtering Guide: For detailed information about filter operators, logical operations, data types, and advanced patterns, see Metadata Filtering.

Paginate and Filter FilesLink to section

List all available files and filter them by status. This operation combines cursor-based pagination with the status filter to retrieve only the subsets of files you care about. For a complete explanation of cursor-based pagination options, see the Pagination Reference.

Paginate Files with Status Filter

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

all_files = []

files = mxbai.stores.files.list(
    store_identifier="my-knowledge-base",
    limit=100,
    # change status to get different subsets
    statuses=["failed"],
)

all_files += files.data

while files.pagination.has_more:
    files = mxbai.stores.files.list(
        store_identifier="my-knowledge-base",
        limit=100,
        after=files.pagination.last_cursor,
        statuses=["failed"],
    )
    all_files += files.data

print(len(all_files), "files matched")

Delete Store FileLink to section

Remove files from your Store:

Delete Store File

from mixedbread import Mixedbread

mxbai = Mixedbread(api_key="YOUR_API_KEY")

response = mxbai.stores.files.delete(
    store_identifier="my-knowledge-base",
    file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
)

print(response)

Important: Deleting a file permanently removes:

The original file from storage
All generated chunks and embeddings
Associated metadata and search indexes
Processing history and logs

If you want to replace a file instead of deleting it first, upload new content with the same external_id. By default, uploads with the same external_id overwrite the previous version. For details, see File Ingestion.

Manage Files

On this page