Schema ORDER BY Rules
Introduction
The ORDER BY clause in ClickHouse schemas determines how data is physically sorted on disk. Understanding how ORDER BY is constructed from meta and derived schemas is crucial for optimal performance.
Common Header Fields
Common header fields, specifically timestamp_load, are always first in the ORDER BY clause. This is a fixed rule that cannot be overridden by meta or derived schemas.
# Common Headers (always first)
column,type,index_type,index_order,os_order,comment
timestamp_load,timestamp,,1,1,load timestampSchema Precedence Rules
Common Headers Priority: timestamp_load is always first in ORDER BY
Derived Schema Precedence: index_order values in derived schema override meta schema
Invalid Values: Invalid index_order values (null, non-numeric) are ignored, falling back to meta schema
Missing Values: Columns with missing index_order are excluded from ORDER BY
Duplicate Values: Duplicate index_order values are allowed and maintain their relative order
Examples
Basic Override Example
# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,2,2,category field
# Derived Schema (overrides meta schema)
column,type,index_type,index_order,os_order,comment
category,string,dimension,1,2,category field
event_id,string,,2,1,event id
# Resulting ORDER BY: (timestamp_load, category, event_id)Invalid Index Order Example
# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,2,2,category field
# Derived Schema (invalid index_order)
column,type,index_type,index_order,os_order,comment
category,string,dimension,,2,category field
event_id,string,,,1,event id
# Resulting ORDER BY: (timestamp_load, event_id, category) # Falls back to meta schema orderMissing Index Order Example
# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,,2,category field # Missing index_order
status,string,,2,3,status field
# Derived Schema
column,type,index_type,index_order,os_order,comment
status,string,dimension,1,3,status field
# Resulting ORDER BY: (timestamp_load, status) # category excluded due to missing index_orderDuplicate Index Order Example
# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,1,2,category field # Duplicate index_order
status,string,,2,3,status field
# Resulting ORDER BY: (timestamp_load, event_id, category, status) # Maintains relative orderBest Practices
Common Headers: Remember that timestamp_load will always be first
Explicit Ordering: Always provide explicit index_order values for columns that need to be in ORDER BY
Derived Schema: Use derived schema to override ordering when needed for specific use cases
Valid Values: Ensure index_order values are valid numbers to avoid fallback behavior
Documentation: Document the intended order in column comments for clarity
Last updated