Schema ORDER BY Rules

Introduction

The ORDER BY clause in ClickHouse schemas determines how data is physically sorted on disk. Understanding how ORDER BY is constructed from meta and derived schemas is crucial for optimal performance.

Common Header Fields

Common header fields, specifically timestamp_load, are always first in the ORDER BY clause. This is a fixed rule that cannot be overridden by meta or derived schemas.

# Common Headers (always first)
column,type,index_type,index_order,os_order,comment
timestamp_load,timestamp,,1,1,load timestamp

Schema Precedence Rules

  1. Common Headers Priority: timestamp_load is always first in ORDER BY

  2. Derived Schema Precedence: index_order values in derived schema override meta schema

  3. Invalid Values: Invalid index_order values (null, non-numeric) are ignored, falling back to meta schema

  4. Missing Values: Columns with missing index_order are excluded from ORDER BY

  5. Duplicate Values: Duplicate index_order values are allowed and maintain their relative order

Examples

Basic Override Example

# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,2,2,category field

# Derived Schema (overrides meta schema)
column,type,index_type,index_order,os_order,comment
category,string,dimension,1,2,category field
event_id,string,,2,1,event id

# Resulting ORDER BY: (timestamp_load, category, event_id)

Invalid Index Order Example

# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,2,2,category field

# Derived Schema (invalid index_order)
column,type,index_type,index_order,os_order,comment
category,string,dimension,,2,category field
event_id,string,,,1,event id

# Resulting ORDER BY: (timestamp_load, event_id, category)  # Falls back to meta schema order

Missing Index Order Example

# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,,2,category field  # Missing index_order
status,string,,2,3,status field

# Derived Schema
column,type,index_type,index_order,os_order,comment
status,string,dimension,1,3,status field

# Resulting ORDER BY: (timestamp_load, status)  # category excluded due to missing index_order

Duplicate Index Order Example

# Meta Schema
column,type,index_type,index_order,os_order,comment
event_id,string,,1,1,event id
category,string,,1,2,category field  # Duplicate index_order
status,string,,2,3,status field

# Resulting ORDER BY: (timestamp_load, event_id, category, status)  # Maintains relative order

Best Practices

  1. Common Headers: Remember that timestamp_load will always be first

  2. Explicit Ordering: Always provide explicit index_order values for columns that need to be in ORDER BY

  3. Derived Schema: Use derived schema to override ordering when needed for specific use cases

  4. Valid Values: Ensure index_order values are valid numbers to avoid fallback behavior

  5. Documentation: Document the intended order in column comments for clarity

Last updated