Manage Files
Once files are uploaded to your Store, you can inspect and manage them using these core operations. All operations work with either the file ID or the file's external ID (if provided during upload).
File Identifiers: You can reference files using either their UUID
(file_id) or their external ID. External IDs support slashes, making it easy
to use file paths as identifiers (e.g., docs/api/authentication.md).
Retrieve Store FileLink to section
Get detailed information about a specific file using either its ID or external ID:
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
file = mxbai.stores.files.retrieve(
store_identifier="my-knowledge-base",
file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
)
print(file)By default, retrieve returns the file object and its file-level metadata. The
chunks field is null unless you explicitly request chunks with
return_chunks.
The response includes processing status, metadata, usage statistics, and error details if applicable.
For complete details on file object properties, see Data Models.
Retrieve File ChunksLink to section
Use return_chunks when you want the parsed, searchable representation of a
file instead of only the file-level metadata.
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
file = mxbai.stores.files.retrieve(
store_identifier="my-knowledge-base",
file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
return_chunks=True,
)
for chunk in file.chunks or []:
print(chunk.chunk_index, chunk.type)Use this when you want to inspect parsed text, OCR output, transcriptions, or
chunk-level generated_metadata.
File Metadata vs. Chunk Metadata: metadata is your file-level metadata
and is shared across all chunks from that file. generated_metadata is
produced during parsing and can differ for each chunk.
For complete details on chunk fields and chunk types, see Data Models.
Retrieve Specific Chunks by IndexLink to section
return_chunks also accepts a list of chunk indices. This is useful when you
want a small, exact slice of a file instead of every chunk.
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
file = mxbai.stores.files.retrieve(
store_identifier="my-knowledge-base",
file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
return_chunks=[0, 3, 7],
)
for chunk in file.chunks or []:
print(chunk.chunk_index)Chunk indices are zero-based and correspond to the file's parsed chunk order.
Retrieve the Exact Chunk Returned by SearchLink to section
Search results include both file_id and chunk_index, so you can use them to
load the exact source chunk that matched a query.
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
results = mxbai.stores.search(
query="authentication timeout",
store_identifiers=["my-knowledge-base"],
top_k=1,
)
match = results.data[0]
file = mxbai.stores.files.retrieve(
store_identifier="my-knowledge-base",
file_identifier=match.file_id,
return_chunks=[match.chunk_index],
)
chunk = file.chunks[0]
print(chunk)This is the easiest way to go from a semantic search hit back to the precise chunk in the original file.
File Status and AvailabilityLink to section
To reliably inspect chunks, wait until the file reaches completed status.
pending: The file was accepted and queued for processingin_progress: Parsing, chunking, embedding, and indexing are still runningcompleted: Chunks are ready to inspect and searchfailed: Processing failed; inspectlast_errorfor detailscancelled: Processing stopped before completion
If you need a file to be ready before continuing, use
uploadAndPoll / upload_and_poll during ingestion or poll retrieve until
the status becomes completed.
List Store FilesLink to section
View all files in your Store. The list operation uses cursor-based pagination:
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
response = mxbai.stores.files.list("my-knowledge-base", limit=20)
for file in response.data:
print(file)Pagination Details: For complete information about cursor-based pagination including parameters, response format, and advanced usage patterns, see the Pagination Reference.
Metadata FilteringLink to section
Filter Store Files based on their metadata.
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
response = mxbai.stores.files.list(
store_identifier="my-knowledge-base",
limit=10,
metadata_filter={
"key": "category",
"value": "documentation",
"operator": "eq",
},
)
for file in response.data:
print(file)Complete Filtering Guide: For detailed information about filter operators, logical operations, data types, and advanced patterns, see Metadata Filtering.
Paginate and Filter FilesLink to section
List all available files and filter them by status. This operation combines cursor-based pagination with the status filter to retrieve only the subsets of files you care about. For a complete explanation of cursor-based pagination options, see the Pagination Reference.
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
all_files = []
files = mxbai.stores.files.list(
store_identifier="my-knowledge-base",
limit=100,
# change status to get different subsets
statuses=["failed"],
)
all_files += files.data
while files.pagination.has_more:
files = mxbai.stores.files.list(
store_identifier="my-knowledge-base",
limit=100,
after=files.pagination.last_cursor,
statuses=["failed"],
)
all_files += files.data
print(len(all_files), "files matched")Delete Store FileLink to section
Remove files from your Store:
from mixedbread import Mixedbread
mxbai = Mixedbread(api_key="YOUR_API_KEY")
response = mxbai.stores.files.delete(
store_identifier="my-knowledge-base",
file_identifier="f47ac10b-58cc-4372-a567-0e02b2c3d479",
)
print(response)Important: Deleting a file permanently removes:
- The original file from storage
- All generated chunks and embeddings
- Associated metadata and search indexes
- Processing history and logs
If you want to replace a file instead of deleting it first, upload new content
with the same external_id. By default, uploads with the same external_id
overwrite the previous version. For details, see File Ingestion.