Apache Hive : Write Ordering

Last updated: October 31, 2025

Apache Hive : Write Ordering

Overview

Write ordering controls the physical layout of data within table files. Unlike SORT BY which orders data during query execution, write ordering is applied at write time and persists in the stored files.

Write ordering is supported for Iceberg tables and can be specified during table creation.

Hive supports two write ordering strategies:

  • Type-Native Ordering: Sort by one or more columns in a specified order
  • Z-Ordering: Multi-dimensional clustering using space-filling curves

Type-Native Column Ordering

Version

Introduced in Hive version 4.1.0

Syntax

CREATE TABLE table_name (column_definitions)
WRITE [LOCALLY] ORDERED BY column_name [ASC | DESC] [NULLS FIRST | NULLS LAST]
  [, column_name [ASC | DESC] [NULLS FIRST | NULLS LAST] ]*
STORED BY ICEBERG
[STORED AS file_format];

Options

  • Sort Order
    • ASC: Ascending order (default)
    • DESC: Descending order
  • Null Order
    • NULLS FIRST: Null values sorted before non-null values
    • NULLS LAST: Null values sorted after non-null values

Examples

Single column:

CREATE TABLE events (
  event_id BIGINT,
  event_date DATE,
  event_type STRING
)
WRITE LOCALLY ORDERED BY event_date DESC
STORED BY ICEBERG
STORED AS ORC;

Multiple columns with null handling:

CREATE TABLE orders (
  order_id BIGINT,
  order_date DATE,
  customer_id INT,
  amount DECIMAL(10,2)
)
WRITE ORDERED BY order_date DESC NULLS FIRST, order_id ASC
STORED BY ICEBERG;

Use Cases

Type-Native ordering is most effective for:

  • Time-series data with temporal access patterns
  • Range queries on sorted columns
  • Queries with consistent ORDER BY clauses
  • Single-dimensional access patterns

Z-Ordering

Version

Introduced in Hive version 4.2.0

Overview

Z-order applies a multi-dimensional clustering technique based on space-filling curves. This approach interleaves column values to co-locate related records across multiple dimensions, enabling efficient filtering on various column combinations.

Syntax

CREATE TABLE table_name (column_definitions)
WRITE [LOCALLY] ORDERED BY ZORDER(column_name [, column_name ]*)
STORED BY ICEBERG
[STORED AS file_format];

Example

CREATE TABLE user_events (
  user_id INT,
  event_date DATE,
  event_type STRING,
  value DOUBLE
)
WRITE LOCALLY ORDERED BY ZORDER(user_id, event_date)
STORED BY ICEBERG
STORED AS ORC;

Table Properties Method

Z-ordering can alternatively be specified using table properties.

CREATE TABLE table_name (column_definitions)
STORED BY ICEBERG
TBLPROPERTIES (
  'sort.order' = 'zorder',
  'sort.columns' = 'column1,column2'
);

Use Cases

Z-order is most effective for:

  • Multi-dimensional analytical queries
  • Ad-hoc queries with varying filter patterns
  • Queries filtering on different column combinations

Comparison with SORT BY

Feature WRITE ORDERED BY SORT BY
Application Write time Query time
Persistence Permanent in files Query result only
Scope Physical file layout Query execution
Table Support Iceberg tables All table types

Limitations and Considerations

  • Write ordering only applies to Iceberg tables
  • Write operations incur ordering overhead:
    • Type-Native ordering: Sort cost
    • Z-order: Sort cost plus z-value computation
  • Column selection should be based on query workload analysis