Schema ORDER BY Rules

Introduction

The ORDER BY clause in ClickHouse schemas determines how data is physically sorted on disk. Understanding how ORDER BY is constructed from meta and derived schemas is crucial for optimal performance.

Common Header Fields

Common header fields, specifically timestamp_load, are always first in the ORDER BY clause. This is a fixed rule that cannot be overridden by meta or derived schemas.

# Common Headers (always first)
column,type,index_type,index_order,os_order,comment
timestamp_load,timestamp,,1,1,load timestamp

Schema Precedence Rules

  1. Common Headers Priority: timestamp_load is always first in ORDER BY

  2. Derived Schema Precedence: index_order values in derived schema override meta schema

  3. Invalid Values: Invalid index_order values (null, non-numeric) are ignored, falling back to meta schema

  4. Missing Values: Columns with missing index_order are excluded from ORDER BY

  5. Duplicate Values: Duplicate index_order values are allowed and maintain their relative order

Examples

Basic Override Example

Invalid Index Order Example

Missing Index Order Example

Duplicate Index Order Example

Best Practices

  1. Common Headers: Remember that timestamp_load will always be first

  2. Explicit Ordering: Always provide explicit index_order values for columns that need to be in ORDER BY

  3. Derived Schema: Use derived schema to override ordering when needed for specific use cases

  4. Valid Values: Ensure index_order values are valid numbers to avoid fallback behavior

  5. Documentation: Document the intended order in column comments for clarity

Last updated