Mixedbread

Metadata Filtering

Metadata filtering provides a powerful way to narrow down results based on the metadata attached to your files.

Quick Example

Here's a simple example of filtering files by category:

Filter Structure

Filters can be structured in two ways depending on your needs:

Single Field Filter (Direct Condition)

For simple single-field filtering, you can use a direct condition:

{"key": "metadata_key", "operator": "comparison", "value": "target_value"}

Example:

{"key": "category", "operator": "eq", "value": "documentation"}

Multiple Field Filter (Logical Operators)

For complex filtering with multiple conditions, use logical operators:

{
  "logical_operator": [
    {"key": "metadata_key", "operator": "comparison", "value": "target_value"}
  ]
}

Example filter structure:

{
  "all": [
    {"key": "category", "operator": "eq", "value": "documentation"}
  ]
}

Logical Operators

Combine multiple conditions using logical operators to create sophisticated filters:

All (AND Operation)

All conditions must be true:

{
  "all": [
    {"key": "category", "operator": "eq", "value": "documentation"},
    {"key": "language", "operator": "eq", "value": "python"},
    {"key": "status", "operator": "eq", "value": "published"}
  ]
}

Any (OR Operation)

At least one condition must be true:

{
  "any": [
    {"key": "language", "operator": "eq", "value": "python"},
    {"key": "language", "operator": "eq", "value": "javascript"},
    {"key": "language", "operator": "eq", "value": "typescript"}
  ]
}

None (NOT Operation)

None of the conditions should be true:

{
  "none": [
    {"key": "status", "operator": "eq", "value": "deprecated"},
    {"key": "status", "operator": "eq", "value": "draft"}
  ]
}

Comparison Operators

Equality and Comparison Operators

// Equal to
{"key": "status", "operator": "eq", "value": "published"}

// Not equal to
{"key": "status", "operator": "not_eq", "value": "draft"}

// Greater than
{"key": "priority", "operator": "gt", "value": 5}

// Greater than or equal to
{"key": "created_at", "operator": "gte", "value": "2024-01-01"}

// Less than
{"key": "rating", "operator": "lt", "value": 3.0}

// Less than or equal to
{"key": "rating", "operator": "lte", "value": 4.5}

// Value in list
{"key": "tags", "operator": "in", "value": ["tutorial", "guide"]}

// Value not in list
{"key": "language", "operator": "not_in", "value": ["deprecated", "legacy"]}

// Pattern matching (case-sensitive)
{"key": "title", "operator": "like", "value": "API*"}

// Pattern exclusion (case-sensitive)
{"key": "filename", "operator": "not_like", "value": "*.tmp"}

Data Type Filtering

String Values

Case-sensitive by default - ensure consistent casing in your metadata:

// String Values (case-sensitive)
{"key": "category", "operator": "eq", "value": "Documentation"}  // Won't match "documentation"

// Use consistent casing in metadata
{
  "category": "documentation",  // lowercase
  "status": "published",        // lowercase
  "team": "engineering"         // lowercase
}

Numeric Values

Support integer and float comparisons:

// Numeric Values
{"key": "priority", "operator": "gt", "value": 5}
{"key": "score", "operator": "gte", "value": 0.8}

Boolean Values

Support true/false conditions:

// Boolean Values
{"key": "is_public", "operator": "eq", "value": true}
{"key": "deprecated", "operator": "eq", "value": false}

Date Values

Recommend ISO 8601 format:

// Date Values (ISO 8601 format recommended)
{"key": "created_at", "operator": "gte", "value": "2024-01-01"}
{"key": "last_updated", "operator": "lt", "value": "2024-12-31T23:59:59Z"}

Array/List Values

Support membership filtering:

// Array/List Values
{
  "tags": ["tutorial", "python", "web"],
  "authors": ["alice", "bob"]
}

// Filter by array membership
{"key": "tags", "operator": "in", "value": ["tutorial", "guide"]}

Combined Logical Operations

Nested Conditions

Complex multi-level filtering example:

{
  "all": [
    {"key": "category", "operator": "eq", "value": "documentation"},
    {
      "any": [
        {"key": "language", "operator": "eq", "value": "python"},
        {"key": "language", "operator": "eq", "value": "javascript"}
      ]
    }
  ],
  "none": [
    {"key": "status", "operator": "eq", "value": "deprecated"}
  ]
}

Advanced Filtering Example

Here's a practical example demonstrating complex nested filters:

Last updated: August 19, 2025