Language Manual

54 documents

Apache Hive : Write Ordering

Overview

Write ordering controls the physical layout of data within table files. Unlike SORT BY which orders data during query execution, write ordering is applied at write time and persists in the stored files.

Write ordering is supported for Iceberg tables and can be specified during table creation.

Hive supports two write ordering strategies:

  • Type-Native Ordering: Sort by one or more columns in a specified order
  • Z-Ordering: Multi-dimensional clustering using space-filling curves

Type-Native Column Ordering

Version

Introduced in Hive version 4.1.0

Apache Hive : Apache Hive SQL Conformance

This page documents which parts of the SQL standard are supported by Apache Hive. The information here is not a full statement of conformance but provides users detail sufficient to generally understand Hive’s SQL conformance.

This information is versioned by Hive release version, allowing a user to quickly identify features available to them.

The formal name of the current SQL standard is ISO/IEC 9075 “Database Language SQL”. A revised version of the standard is released from time to time; the most recent update appearing in 2016. The 2016 version is referred to as ISO/IEC 9075:2016, or simply as SQL:2016. Hive’s SQL Conformance pages reference SQL features by the Feature ID values of the SQL:2016 Standard.

Apache Hive : CAST…FORMAT with SQL:2016 datetime formats

Usage

CAST(<timestamp/date> AS <varchar/char/string> [FORMAT <template>])  
CAST(<varchar/char/string> AS <timestamp/date> [FORMAT <template>])

Example

select cast(dt as string format 'DD-MM-YYYY')  
select cast('01-05-2017' as date format 'DD-MM-YYYY')

Template elements, a.k.a. Tokens, a.k.a Patterns a.k.a SQL:2016 Datetime Formats

Notes

For all tokens:

  • Patterns are case-insensitive, except AM/PM and T/Z. See these sections for more details.
  • For string to datetime conversion, no duplicate format tokens are allowed, including tokens
    that have the same meaning but different lengths (“Y” and “YY” conflict) or different
    behaviors (“RR” and “YY” conflict).

For all numeric tokens:

Apache Hive : Common Table Expression

A Common Table Expression (CTE) is a temporary result set derived from a simple query specified in a WITH clause, which immediately precedes a SELECT or INSERT keyword.  The CTE is defined only within the execution scope of a single statement.  One or more CTEs can be used in a Hive SELECT, INSERTCREATE TABLE AS SELECT, or CREATE VIEW AS SELECT statement.

Version

Common Table Expressions are added in Hive 0.13.0 with HIVE-1180.

Apache Hive : Compaction pooling

Concept:

Compaction requests and workers can be assigned to pools. A worker assigned to a specific pool will only process compaction requests in that pool. Workers and compaction requests without pool assignment are implicitly belong to the default pool. The pooling concept allows fine tuning of processing compaction requests. For example it is possible to create a pool name ‘high priority compaction’, assign some frequently modified tables to it, and dedicate a set of workers to this pool. As a result, the compaction requests for these tables will be immediately picked up by the dedicated workers, even if there are several other compaction requests (enqueued earlier) in the default queue.

Apache Hive : Datasketches Integration

Apache DataSketches (https://datasketches.apache.org/) is integrated into Hive via HIVE-22939.
This enables various kind of sketch operations thru regular sql statement.

Sketch functions

Naming convention

All sketch functions are registered using the following naming convention:

ds_{sketchType}_{functionName}

For example we have a function called: ds_hll_estimate which could be used to estimate the distinct values from an hll sketch.

sketchType

For detailed info about the sketches themself please refer to the datasketches site!

Apache Hive : Enhanced Aggregation, Cube, Grouping and Rollup

This document describes enhanced aggregation features for the GROUP BY clause of SELECT statements.

Version

Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were added in Hive 0.10.0.
See HIVE-2397, HIVE-3433, HIVE-3471, and HIVE-3613.
Also see HIVE-3552 for an improvement added in Hive 0.11.0.

Version

GROUPING__ID is compliant with semantics in other SQL engines starting in Hive 2.3.0 (see HIVE-16102).
Support for SQL grouping function was added in Hive 2.3.0 too (see HIVE-15409).

Apache Hive : Exchange Partition

The EXCHANGE PARTITION command will move a partition from a source table to target table and alter each table’s metadata.  The Exchange Partition feature is implemented as part of HIVE-4095. Exchanging multiple partitions is supported in Hive versions 1.2.2, 1.3.0, and 2.0.0+ as part of HIVE-11745.

When the command is executed, the source table’s partition folder in HDFS will be renamed to move it to the destination table’s partition folder.  The Hive metastore will be updated to change the metadata of the source and destination tables accordingly.

Apache Hive : Tutorial to write a GenericUDAF

User-Defined Aggregation Functions (UDAFs) are an excellent way to integrate advanced data-processing into Hive. Hive allows two varieties of UDAFs: simple and generic. Simple UDAFs, as the name implies, are rather simple to write, but incur performance penalties because of the use of Java Reflection, and do not allow features such as variable-length argument lists. Generic UDAFs allow all these features, but are perhaps not quite as intuitive to write as Simple UDAFs.

Apache Hive : Hive Operators

Operators Precedences

ExampleOperatorsDescription
A[B] , A.identifierbracket_op([]), dot(.)element selector, dot
-Aunary(+), unary(-), unary(~)unary prefix operators
A IS [NOT] (NULLTRUEFALSE)
A ^ Bbitwise xor(^)bitwise xor
A * Bstar(*), divide(/), mod(%), div(DIV)multiplicative operators
A + Bplus(+), minus(-)additive operators
AB
A & Bbitwise and(&)bitwise and
ABbitwise or(

Relational Operators

The following operators compare the passed operands and generate a TRUE or FALSE value depending on whether the comparison between the operands holds.

Apache Hive : Hive UDFs

Hive User-Defined Functions (UDFs) are custom functions developed in Java and seamlessly integrated with Apache Hive. UDFs are routines designed to accept parameters, execute a specific action, and return the resulting value. The return value can either be a single scalar row or a complete result set, depending on the UDF’s code and the implemented interface. UDFs represent a powerful capability that enhances classical SQL functionality by allowing the integration of custom code, providing Hive users with a versatile toolset. Apache Hive comes equipped with a variety of built-in UDFs that users can leverage. Similar to other SQL-based solutions, Hive also offers functionality to expand its already rich set of UDFs by incorporating custom ones as needed.

Apache Hive : Plugins

Creating Custom UDFs

First, you need to create a new class that extends UDF, with one or more methods named evaluate.

package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class Lower extends UDF {
  public Text evaluate(final Text s) {
    if (s == null) { return null; }
    return new Text(s.toString().toLowerCase());
  }
}

(Note that there’s already a built-in function for this, it’s just an easy example).

Apache Hive : HiveQL

This page is deprecated

Please see the HiveQL Language Manual

Apache Hive : LanguageManual

This is the Hive Language Manual.  For other Hive documentation, see the Hive wiki’s Home page.

Apache Hive : LanguageManual Archiving

Archiving for File Count Reduction.

Overview

Due to the design of HDFS, the number of files in the filesystem directly affects the memory consumption in the namenode. While normally not a problem for small clusters, memory usage may hit the limits of accessible memory on a single machine when there are >50-100 million files. In such situations, it is advantageous to have as few files as possible.

Apache Hive : LanguageManual Authorization

Introduction

Note that this documentation is referring to Authorization which is verifying if a user has permission to perform a certain action, and not about Authentication (verifying the identity of the user). Strong authentication for tools like the Hive command line is provided through the use of Kerberos. There are additional authentication options for users of HiveServer2.

Hive Authorization Options

Three modes of Hive authorization are available to satisfy different use cases.

Apache Hive : LanguageManual Hive CLI

$HIVE_HOME/bin/hive is a shell utility which can be used to run Hive queries in either interactive or batch mode.

Deprecation in favor of Beeline CLI

HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline, which is a JDBC client based on SQLLine.  Due to new development being focused on HiveServer2, Hive CLI will soon be deprecated in favor of Beeline (HIVE-10511).

See Replacing the Implementation of Hive CLI Using Beeline and Beeline – New Command Line Shell in the HiveServer2 documentation.

Apache Hive : LanguageManual Commands

Commands are non-SQL statements such as setting a property or adding a resource. They can be used in HiveQL scripts or directly in the CLI or Beeline.

CommandDescription
quit exitUse quit or exit to leave the interactive shell.
resetResets the configuration to the default values (as of Hive 0.10: see HIVE-3202). Any configuration parameters that were set using the set command or -hiveconf parameter in hive commandline will get reset to default value.Note that this does not apply to configuration parameters that were set in set command using the “hiveconf:” prefix for the key name (for historic reasons).
set =Sets the value of a particular configuration variable (key). Note: If you misspell the variable name, the CLI will not show an error.
setPrints a list of configuration variables that are overridden by the user or Hive.
set -vPrints all Hadoop and Hive configuration variables.
add FILE[S] * add JAR[S] * add ARCHIVE[S] *Adds one or more files, jars, or archives to the list of resources in the distributed cache. See Hive Resources for more information.
add FILE[S] * add JAR[S]  * add ARCHIVE[S] *As of Hive 1.2.0, adds one or more files, jars or archives to the list of resources in the distributed cache using an Ivy URL of the form ivy://group:module:version?query_string. See Hive Resources for more information.
list FILE[S] list JAR[S] list ARCHIVE[S]Lists the resources already added to the distributed cache. See Hive Resources for more information.
list FILE[S] * list JAR[S] * list ARCHIVE[S] *Checks whether the given resources are already added to the distributed cache or not. See Hive Resources for more information.
delete FILE[S] * delete JAR[S] * delete ARCHIVE[S] *Removes the resource(s) from the distributed cache.
delete FILE[S] * delete JAR[S] * delete ARCHIVE[S] *As of Hive 1.2.0, removes the resource(s) which were added using the from the distributed cache. See Hive Resources for more information.
!Executes a shell command from the Hive shell.
dfsExecutes a dfs command from the Hive shell.
Executes a Hive query and prints results to standard output.
source FILEExecutes a script file inside the CLI.
compile <groovy string> AS GROOVY NAMEDThis allows inline Groovy code to be compiled and be used as a UDF (as of Hive 0.13.0). For a usage example, see Nov. 2013 Hive Contributors Meetup Presentations – Using Dynamic Compilation with Hive.
show processlistDisplays information about the operations currently running on HiveServer2. It helps to troubleshoot issues such as long running queries, connection starvation, etc. The command was introduced in HIVE-27829.

Sample Usage:

Apache Hive : LanguageManual DDL

Overview

HiveQL DDL statements are documented here, including:

  • CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
  • DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
  • TRUNCATE TABLE
  • ALTER DATABASE/SCHEMA, TABLE, VIEW
  • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
  • SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
  • DESCRIBE DATABASE/SCHEMA, table_name, view_name, materialized_view_name

PARTITION statements are usually options of TABLE statements, except for SHOW PARTITIONS.

Keywords, Non-reserved Keywords and Reserved Keywords

All Keywords
VersionNon-reserved Keywords
Hive 1.2.0ADD, ADMIN, AFTER, ANALYZE, ARCHIVE, ASC, BEFORE, BUCKET, BUCKETS, CASCADE, CHANGE, CLUSTER, CLUSTERED, CLUSTERSTATUS, COLLECTION, COLUMNS, COMMENT, COMPACT, COMPACTIONS, COMPUTE, CONCATENATE, CONTINUE, DATA, DATABASES, DATETIME, DAY, DBPROPERTIES, DEFERRED, DEFINED, DELIMITED, DEPENDENCY, DESC, DIRECTORIES, DIRECTORY, DISABLE, DISTRIBUTE, ENABLE, ESCAPED, EXCLUSIVE, EXPLAIN, EXPORT, FIELDS, FILE, FILEFORMAT, FIRST, FORMAT, FORMATTED, FUNCTIONS, HOLD_DDLTIME, HOUR, IDXPROPERTIES, IGNORE, INDEX, INDEXES, INPATH, INPUTDRIVER, INPUTFORMAT, ITEMS, JAR, KEYS, LIMIT, LINES, LOAD, LOCATION, LOCK, LOCKS, LOGICAL, LONG, MAPJOIN, MATERIALIZED, METADATA, MINUS, MINUTE, MONTH, MSCK, NOSCAN, NO_DROP, OFFLINE, OPTION, OUTPUTDRIVER, OUTPUTFORMAT, OVERWRITE, OWNER, PARTITIONED, PARTITIONS, PLUS, PRETTY, PRINCIPALS, PROTECTION, PURGE, READ, READONLY, REBUILD, RECORDREADER, RECORDWRITER, REGEXP, RELOAD, RENAME, REPAIR, REPLACE, REPLICATION, RESTRICT, REWRITE, RLIKE, ROLE, ROLES, SCHEMA, SCHEMAS, SECOND, SEMI, SERDE, SERDEPROPERTIES, SERVER, SETS, SHARED, SHOW, SHOW_DATABASE, SKEWED, SORT, SORTED, SSL, STATISTICS, STORED, STREAMTABLE, STRING, STRUCT, TABLES, TBLPROPERTIES, TEMPORARY, TERMINATED, TINYINT, TOUCH, TRANSACTIONS, UNARCHIVE, UNDO, UNIONTYPE, UNLOCK, UNSET, UNSIGNED, URI, USE, UTC, VIEW, WHILE, YEAR
Hive 2.0.0removed: HOLD_DDLTIME, IGNORE, NO_DROP, OFFLINE, PROTECTION, READONLY, REGEXP, RLIKEadded: AUTOCOMMIT, ISOLATION, LEVEL, OFFSET, SNAPSHOT, TRANSACTION, WORK, WRITE
Hive 2.1.0added: ABORT, KEY, LAST, NORELY, NOVALIDATE, NULLS, RELY, VALIDATE
Hive 2.2.0removed: MINUSadded: CACHE, DAYS, DAYOFWEEK, DUMP, HOURS, MATCHED, MERGE, MINUTES, MONTHS, QUARTER, REPL, SECONDS, STATUS, VIEWS, WEEK, WEEKS, YEARS
Hive 2.3.0removed: MERGEadded: DETAIL, EXPRESSION, OPERATOR, SUMMARY, VECTORIZATION, WAIT
Hive 3.0.0removed: PRETTYadded: ACTIVATE, ACTIVE, ALLOC_FRACTION, CHECK, DEFAULT, DO, ENFORCED, KILL, MANAGEMENT, MAPPING, MOVE, PATH, PLAN, PLANS, POOL, QUERY, QUERY_PARALLELISM, REOPTIMIZATION, RESOURCE, SCHEDULING_POLICY, UNMANAGED, WORKLOAD, ZONE
Hive 3.1.0N/A
Hive 4.0.0added: AST, AT, BRANCH, CBO, COST, CRON, DCPROPERTIES, DEBUG, DISABLED, DISTRIBUTED, ENABLED, EVERY, EXECUTE, EXECUTED, EXPIRE_SNAPSHOTS, IGNORE, JOINCOST, MANAGED, MANAGEDLOCATION, OPTIMIZE, REMOTE, RESPECT, RETAIN, RETENTION, SCHEDULED, SET_CURRENT_SNAPSHOT, SNAPSHOTS, SPEC, SYSTEM_TIME, SYSTEM_VERSION, TAG, TRANSACTIONAL, TRIM, TYPE, UNKNOWN, URL, WITHIN

Version information

Apache Hive : LanguageManual DDL BucketedTables

This is a brief example on creating and populating bucketed tables. (For another example, see Bucketed Sorted Tables.)

Bucketed tables are fantastic in that they allow much more efficient sampling than do non-bucketed tables, and they may later allow for time saving operations such as mapside joins. However, the bucketing specified at table creation is not enforced when the table is written to, and so it is possible for the table’s metadata to advertise properties which are not upheld by the table’s actual layout. This should obviously be avoided. Here’s how to do it right.

Apache Hive : LanguageManual DML

Hive Data Manipulation Language

Loading files into tables

Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.

Syntax
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
Synopsis

Load operations prior to Hive 3.0 are pure copy/move operations that move datafiles into locations corresponding to Hive tables.

Apache Hive : LanguageManual Explain

EXPLAIN Syntax

Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:

EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query

AUTHORIZATION is supported from HIVE 0.14.0 via HIVE-5961VECTORIZATION is supported from Hive 2.3.0 via HIVE-11394LOCKS is supported from Hive 3.2.0 via HIVE-17683.

AST was removed from EXPLAIN EXTENDED in HIVE-13533 and reinstated as a separate command in HIVE-15932.

The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.

Apache Hive : LanguageManual GroupBy

Group By Syntax

groupByClause: GROUP BY groupByExpression (, groupByExpression)*

groupByExpression: expression

groupByQuery: SELECT expression (, expression)* FROM src groupByClause?

In groupByExpression columns are specified by name, not by position number. However in Hive 0.11.0 and later, columns can be specified by position when configured as follows:

Simple Examples

In order to count the number of rows in a table:

Apache Hive : LanguageManual Import/Export

Version information

The EXPORT and IMPORT commands were added in Hive 0.8.0 (see HIVE-1918).

Replication extensions to the EXPORT and IMPORT commands were added in Hive 1.2.0 (see HIVE-7973 and Hive Replication Development).

Overview

The EXPORT command exports the data of a table or partition, along with the metadata, into a specified output location. This output location can then be moved over to a different Hadoop or Hive instance and imported from there with the IMPORT command.

Apache Hive : LanguageManual Indexing

Indexing Is Removed since 3.0

There are alternate options which might work similarily to indexing:

  • Materialized views with automatic rewriting can result in very similar results.  Hive 2.3.0 adds support for materialzed views.
  • Using columnar file formats (ParquetORC) – they can do selective scanning; they may even skip entire files/blocks.

Indexing has been removed in version 3.0 (HIVE-18448).

Overview of Hive Indexes

The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table. Without an index, queries with predicates like ‘WHERE tab1.col1 = 10’ load the entire table or partition and process all the rows. But if an index exists for col1, then only a portion of the file needs to be loaded and processed.

Apache Hive : LanguageManual Join Optimization

Improvements to the Hive Optimizer

Version

The join optimizations described here were added in Hive version 0.11.0. See HIVE-3784 and related JIRAs.

This document describes optimizations of Hive’s query execution planning to improve the efficiency of joins and reduce the need for user hints.

Hive automatically recognizes various use cases and optimizes for them. Hive 0.11 improves the optimizer for these cases:

  • Joins where one side fits in memory. In the new optimization:
    • that side is loaded into memory as a hash table
    • only the larger table needs to be scanned
    • fact tables have a smaller footprint in memory
  • Star-schema joins
  • Hints are no longer needed for many cases.
  • Map joins are automatically picked up by the optimizer.

Star Join Optimization

A simple schema for decision support systems or data warehouses is the star schema, where events are collected in large fact tables, while smaller supporting tables (dimensions) are used to describe the data.

Apache Hive : LanguageManual Joins

Join Syntax

Hive supports the following syntax for joining tables:

join_table:
    table_reference [INNER] JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10)

table_reference:
    table_factor
  | join_table

table_factor:
    tbl_name [alias]
  | table_subquery alias
  | ( table_references )

join_condition:
    ON expression

See Select Syntax for the context of this join syntax.

Apache Hive : LanguageManual LateralView

Lateral View Syntax

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*

Description

Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

Apache Hive : LanguageManual LZO Compression

General LZO Concepts

LZO is a lossless data compression library that favors speed over compression ratio. See http://www.oberhumer.com/opensource/lzo and http://www.lzop.org for general information about LZO and see Compressed Data Storage for information about compression in Hive.

Imagine a simple data file that has three columns

  • id
  • first name
  • last name

Let’s populate a data file containing 4 records:

19630001     john          lennon
19630002     paul          mccartney
19630003     george        harrison
19630004     ringo         starr

Let’s call the data file /path/to/dir/names.txt.

Apache Hive : LanguageManual ORC

ORC Files

ORC File Format

Version

Introduced in Hive version 0.11.0.

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.

Compared with RCFile format, for example, ORC file format has many advantages such as:

Apache Hive : LanguageManual Sampling

Sampling Syntax

Sampling Bucketized Table


table_sample: TABLESAMPLE (BUCKET x OUT OF y [ON colname])

The TABLESAMPLE clause allows the users to write queries for samples of the data instead of the whole table. The TABLESAMPLE clause can be added to any table in the FROM clause. The buckets are numbered starting from 1. colname indicates the column on which to sample each row in the table. colname can be one of the non-partition columns in the table or rand() indicating sampling on the entire row instead of an individual column. The rows of the table are ‘bucketed’ on the colname randomly into y buckets numbered 1 through y. Rows which belong to bucket x are returned.

Apache Hive : LanguageManual Select

Select Syntax

[WITH CommonTableExpression (, CommonTableExpression)*]    (Note: Only available starting with Hive 0.13.0)
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
  FROM table_reference
  [WHERE where_condition]
  [GROUP BY col_list]
  [ORDER BY col_list]
  [CLUSTER BY col_list
    | [DISTRIBUTE BY col_list] [SORT BY col_list]
  ]
 [LIMIT [offset,] rows]
  • A SELECT statement can be part of a union query or a subquery of another query.
  • table_reference indicates the input to the query. It can be a regular table, a view, a join construct or a subquery.
  • Table names and column names are case insensitive.
    • In Hive 0.12 and earlier, only alphanumeric and underscore characters are allowed in table and column names.
    • In Hive 0.13 and later, column names can contain any Unicode character (see HIVE-6013). Any column name that is specified within backticks (```) is treated literally. Within a backtick string, use double backticks (````) to represent a backtick character.
    • To revert to pre-0.13.0 behavior and restrict column names to alphanumeric and underscore characters, set the configuration property [hive.support.quoted.identifiers](#hive-support-quoted-identifiers) to none. In this configuration, backticked names are interpreted as regular expressions. For details, see Supporting Quoted Identifiers in Column Names (attached to HIVE-6013). Also see REGEX Column Specification below.
  • Simple query. For example, the following query retrieves all columns and all rows from table t1.
SELECT * FROM t1 

Note

Apache Hive : LanguageManual SortBy

Order, Sort, Cluster, and Distribute By

This describes the syntax of SELECT clauses ORDER BY, SORT BY, CLUSTER BY, and DISTRIBUTE BY.  See Select Syntax for general information.

Syntax of Order By

The ORDER BY syntax in Hive QL is similar to the syntax of ORDER BY in SQL language.

colOrder: ( ASC | DESC )
colNullOrder: (NULLS FIRST | NULLS LAST)           -- (Note: Available in Hive 2.1.0 and later)
orderBy: ORDER BY colName colOrder? colNullOrder? (',' colName colOrder? colNullOrder?)*
query: SELECT expression (',' expression)* FROM src orderBy

There are some limitations in the “order by” clause. In the strict mode (i.e., hive.mapred.mode=strict), the order by clause has to be followed by a “limit” clause. The limit clause is not necessary if you set hive.mapred.mode to nonstrict. The reason is that in order to impose total order of all results, there has to be one reducer to sort the final output. If the number of rows in the output is too large, the single reducer could take a very long time to finish.

Apache Hive : LanguageManual SubQueries

Subqueries in the FROM Clause

SELECT ... FROM (subquery) name ...
SELECT ... FROM (subquery) AS name ...   (Note: Only valid starting with Hive 0.13.0)

Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names. The columns in the subquery select list are available in the outer query just like columns of a table. The subquery can also be a query expression with UNION. Hive supports arbitrary levels of subqueries.

Apache Hive : LanguageManual Transform

Transform/Map-Reduce Syntax

Users can also plug in their own custom mappers and reducers in the data stream by using features natively supported in the Hive language. e.g. in order to run a custom mapper script - map_script - and a custom reducer script - reduce_script - the user can issue the following command which uses the TRANSFORM clause to embed the mapper and the reducer scripts.

Apache Hive : LanguageManual Data Types

Overview

This lists all supported data types in Hive. See Type System in the Tutorial for additional information.

For data types supported by HCatalog, see:

Numeric Types

  • TINYINT (1-byte signed integer, from -128 to 127)
  • SMALLINT (2-byte signed integer, from -32,768 to 32,767)

INT/INTEGER (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)

* [`BIGINT`](https://hive.apache.org/docs/latest/language/languagemanual-types#integral-types-tinyintsmallintintintegerbigint) (8-byte signed integer, from `-9,223,372,036,854,775,808` to `9,223,372,036,854,775,807`)
* `FLOAT` (4-byte single precision floating point number)
* `DOUBLE` (8-byte double precision floating point number)
* ```
DOUBLE PRECISION (alias for DOUBLE, only available starting with Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-13556))
  • DECIMAL
    • Introduced in Hive 0.11.0 with a precision of 38 digits
    • Hive 0.13.0 introduced user-definable precision and scale
  • NUMERIC (same as DECIMAL, starting with Hive 3.0.0)

Date/Time Types

String Types

Misc Types

  • BOOLEAN
  • BINARY (Note: Only available starting with Hive 0.8.0)

Complex Types

  • arrays: ARRAY<data_type> (Note: negative values and non-constant expressions are allowed as of Hive 0.14.)
  • maps: MAP<primitive_type, data_type> (Note: negative values and non-constant expressions are allowed as of Hive 0.14.)
  • structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>
  • union: UNIONTYPE<data_type, data_type, ...> (Note: Only available starting with Hive 0.7.0.)

Column Types

Integral Types (TINYINTSMALLINTINT/INTEGERBIGINT)

Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.

Apache Hive : LanguageManual Operators and User-Defined Functions

Overview

All Hive keywords are case-insensitive, including the names of Hive operators and functions.

In Beeline or the CLI, use the commands below to show the latest documentation:

SHOW FUNCTIONS;
DESCRIBE FUNCTION <function_name>;
DESCRIBE FUNCTION EXTENDED <function_name>;

Bug for expression caching when UDF nested in UDF or function

When hive.cache.expr.evaluation is set to true (which is the default) a UDF can give incorrect results if it is nested in another UDF or a Hive function. This bug affects releases 0.12.0, 0.13.0, and 0.13.1. Release 0.14.0 fixed the bug (HIVE-7314).

Apache Hive : LanguageManual Union

Union Syntax

select_statement UNION [ALL | DISTINCT] select_statement UNION [ALL | DISTINCT] select_statement ...

UNION is used to combine the result from multiple SELECT statements into a single result set.

  • Hive versions prior to 1.2.0 only support UNION ALL (bag union), in which duplicate rows are not eliminated.
  • In Hive 1.2.0 and later, the default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements.

You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION types are treated such that a DISTINCT union overrides any ALL union to its left. A DISTINCT union can be produced explicitly by using UNION DISTINCT or implicitly by using UNION with no following DISTINCT or ALL keyword.

Apache Hive : LanguageManual VariableSubstitution

Introduction

Hive is used for batch and interactive queries. Variable Substitution allows for tasks such as separating environment-specific configuration variables from code.

The Hive variable substitution mechanism was designed to avoid some of the code that was getting baked into the scripting language on top of Hive.

Examples such as the following shell commands may (inefficiently) be used to set variables within a script:

$ a=b
$ hive -e " describe $a "

This is frustrating as Hive becomes closely coupled with scripting languages. The Hive startup time of a couple seconds is non-trivial when doing thousands of manipulations such as multiple hive -e invocations.

Apache Hive : LanguageManual VirtualColumns

Virtual Columns

Hive 0.8.0 provides support for two virtual columns:

One is INPUT__FILE__NAME, which is the input file’s name for a mapper task.

the other is BLOCK__OFFSET__INSIDE__FILE, which is the current global file position.

For block compressed file, it is the current block’s file offset, which is the current block’s first byte’s file offset.

Since Hive 0.8.0 the following virtual columns have been added:

  • ROW__OFFSET__INSIDE__BLOCK
  • RAW__DATA__SIZE
  • ROW__ID
  • GROUPING__ID

It is important to note, that all of the virtual columns listed here cannot be used for any other purpose (i.e. table creation with columns having a virtual column will fail with “SemanticException Error 10328: Invalid column name..”)

Apache Hive : LanguageManual WindowingAndAnalytics

Enhancements to Hive QL

Introduced in Hive version 0.11.

This section introduces the Hive QL enhancements for windowing and analytics functions. See “Windowing Specifications in HQL” (attached to HIVE-4197) for details. HIVE-896 has more information, including links to earlier documentation in the initial comments.

All of the windowing and analytics functions operate as per the SQL standard.

The current release supports the following functions for windowing and analytics:

Apache Hive : LanguageManual XPathUDF

Documentation for Built-In User-Defined Functions Related To XPath

UDFs

xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string

  • Functions for parsing XML data using XPath expressions.
  • Since version: 0.6.0

Overview

The xpath family of UDFs are wrappers around the Java XPath library javax.xml.xpath provided by the JDK. The library is based on the XPath 1.0 specification. Please refer to http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for detailed information on the Java XPath library.

Apache Hive : Literals

Literals

Integral types

Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.

TypePostfixExample
TINYINTY100Y
SMALLINTS100S
BIGINTL100L
String types

String literals can be expressed with either single quotes (’) or double quotes ("). Hive uses C-style escaping within the strings.

Apache Hive : Managed vs. External Tables

Hive fundamentally knows two different types of tables:

  • Managed (Internal)
  • External

Introduction

This document lists some of the differences between the two but the fundamental difference is that Hive assumes that it owns the data for managed tables. That means that the data, its properties and data layout will and can only be changed via Hive command. The data still lives in a normal file system and nothing is stopping you from changing it without telling Hive about it. If you do though it violates invariants and expectations of Hive and you might see undefined behavior.

Apache Hive : Materialized views

This page documents the work done for the supporting materialized views in Apache Hive.

Version information

Materialized views support is introduced in Hive 3.0.0.

Objectives

Traditionally, one of the most powerful techniques used to accelerate query processing in data warehouses is the pre-computation of relevant summaries or materialized views.

The initial implementation introduced in Apache Hive 3.0.0 focuses on introducing materialized views and automatic query rewriting based on those materializations in the project. In particular, materialized views can be stored natively in Hive or in other systems such as Druid using custom storage handlers, and they can seamlessly exploit new exciting Hive features such as LLAP acceleration. Then the optimizer relies in Apache Calcite to automatically produce full and partial rewritings for a large set of query expressions comprising projections, filters, join, and aggregation operations.

Apache Hive : Partition Filter Syntax

Example: for a table having partition keys country and state, one could construct the following filter:

country = "USA" AND (state = "CA" OR state = "AZ")

In particular notice that it is possible to nest sub-expressions within parentheses.

The following operators are supported when constructing filters for partition columns (derived from HIVE-1862):

  • =
  • <
  • <=
  • >
  • >=
  • <>
  • AND
  • OR
  • LIKE (on keys of type string only, supports literal string template with ‘.*' wildcard)

Apache Hive : ReflectUDF

Reflect (Generic) UDF

A Java class and method often exists to handle the exact function a user would like to use in Hive. Rather than having to write a wrapper UDF to call this method, the majority of these methods can be called using reflect UDF. Reflect uses Java reflection to instantiate and call methods of objects; it can also call static functions. The method must return a primitive type or a type that Hive knows how to serialize.

Apache Hive : Scheduled Queries

Introduction

Executing statements periodically can be usefull in

  • Pulling informations from external systems
  • Periodically updating column statistics
  • Rebuilding materialized views

Overview

  • The metastore maintains the scheduled queries in the metastore database
  • Hiveserver(s) periodically polls the metastore for a scheduled query to be executed
  • During execution informations about ongoing/finished executions are kept in the metastore

Scheduled queries were added in Hive 4.0 (HIVE-21884)

Hive has it’s scheduled query interface built into the language itself for easy access:

Apache Hive : SQL Standard Based Hive Authorization

Status of Hive Authorization before Hive 0.13

The default authorization in Hive is not designed with the intent to protect against malicious users accessing data they should not be accessing. It only helps in preventing users from accidentally doing operations they are not supposed to do. It is also incomplete because it does not have authorization checks for many operations including the grant statement. The authorization checks happen during Hive query compilation. But as the user is allowed to execute dfs commands, user-defined functions and shell commands, it is possible to bypass the client security checks.

Apache Hive : Statistics and Data Mining

This page is the secondary documentation for the slightly more advanced statistical and data mining functions that are being integrated into Hive, and especially the functions that warrant more than one-line descriptions.

ngrams() and context_ngrams(): N-gram frequency estimation

N-grams are subsequences of length N drawn from a longer sequence. The purpose of the ngrams() UDAF is to find the k most frequent n-grams from one or more sequences. It can be used in conjunction with the sentences() UDF to analyze unstructured natural language text, or the collect() function to analyze more general string data.

Apache Hive : Supported Features: Apache Hive 3.1

This table covers all mandatory features from SQL:2016 as well as optional features that Hive implements.

Feature IDFeature NameImplementedMandatoryComments
E011Numeric data typesYesMandatory
E011-01INTEGER and SMALLINT data types (including all spellings)YesMandatory
E011-02REAL, DOUBLE PRECISON, and FLOAT data typesYesMandatory
E011-03DECIMAL and NUMERIC data typesYesMandatory
E011-04Arithmetic operatorsYesMandatory
E011-05Numeric comparisonYesMandatory
E011-06Implicit casting among the numeric data typesYesMandatory
E021Character string typesYesMandatory
E021-01CHARACTER data type (including all its spellings)PartialMandatoryOnly support CHAR, not CHARACTER
E021-02CHARACTER VARYING data type (including all its spellings)PartialMandatoryOnly support VARCHAR, not CHARACTER VARYING or CHAR VARYING
E021-03Character literalsYesMandatory
E021-04CHARACTER_LENGTH functionYesMandatory
E021-05OCTET_LENGTH functionYesMandatory
E021-06SUBSTRING functionPartialMandatoryStandard: SUBSTRING(val FROM startpos [FOR len]). Hive: SUBSTRING(val, startpos [, len])
E021-07Character concatenationYesMandatory
E021-08UPPER and LOWER functionsYesMandatory
E021-09TRIM functionYesMandatory
E021-10Implicit casting among the fixed-length and variable-length character string typesYesMandatory
E021-11POSITION functionNoMandatory
E021-12Character comparisonYesMandatory
E031IdentifiersPartialMandatoryUnquoted identifiers use C syntax ([A-Za-z][A-Za-z0-9_]*). Quoted identifiers can have any character.
E031-01Delimited identifiersPartialMandatoryQuoting done with ` rather than “, only supported for columns, not tables, views, etc.
E031-02Lower case identifiersYesMandatory
E031-03Trailing underscoreYesMandatory
E051Basic query specificationYesMandatory
E051-01SELECT DISTINCTYesMandatory
E051-02GROUP BY clauseYesMandatory
E051-04GROUP BY can contain columns not inYesMandatory
E051-05Select list items can be renamedYesMandatory
E051-06HAVING clauseYesMandatory
E051-07Qualified * in select listYesMandatory
E051-08Correlation names in the FROM clauseYesMandatory
E051-09Rename columns in the FROM clauseYesMandatory
E061Basic predicates and search conditionsYesMandatory
E061-01Comparison predicateYesMandatory
E061-02BETWEEN predicateYesMandatory
E061-03IN predicate with list of valuesYesMandatory
E061-04LIKE predicateYesMandatory
E061-05LIKE predicate: ESCAPE clauseYesMandatory
E061-06NULL predicateYesMandatory
E061-07Quantified comparison predicateNoMandatory
E061-08EXISTS predicateYesMandatory
E061-09Subqueries in comparison predicateNoMandatory
E061-11Subqueries in IN predicateYesMandatory
E061-12Subqueries in quantified comparison predicateNoMandatory
E061-13Correlated subqueriesYesMandatory
E061-14Search conditionYesMandatory
E071Basic query expressionsYesMandatory
E071-01UNION DISTINCT table operatorYesMandatory
E071-02UNION ALL table operatorYesMandatory
E071-03EXCEPT DISTINCT table operatorYesMandatory
E071-05Columns combined via table operators need not have exactly the same data type.YesMandatory
E071-06Table operators in subqueriesYesMandatory
E081Basic PrivilegesYesMandatory
E081-01SELECT privilege at the table levelYesMandatory
E081-02DELETE privilegeYesMandatory
E081-03INSERT privilege at the table levelYesMandatory
E081-04UPDATE privilege at the table levelYesMandatory
E081-05UPDATE privilege at the column levelYesMandatory
E081-06REFERENCES privilege at the table levelNoMandatory
E081-07REFERENCES privilege at the column levelNoMandatory
E081-08WITH GRANT OPTIONYesMandatory
E081-09USAGE privilegeNoMandatory
E081-10EXECUTE privilegeNoMandatory
E091Set functionsYesMandatory
E091-01AVGYesMandatory
E091-02COUNTYesMandatory
E091-03MAXYesMandatory
E091-04MINYesMandatory
E091-05SUMYesMandatory
E091-06ALL quantifierYesMandatory
E091-07DISTINCT quantifierYesMandatory
E101Basic data manipulationYesMandatory
E101-01INSERT statementYesMandatory
E101-03Searched UPDATE statementYesMandatory
E101-04Searched DELETE statementYesMandatory
E111Single row SELECT statementNoMandatory
E121Basic cursor supportNoMandatory
E121-01DECLARE CURSORNoMandatory
E121-02ORDER BY columns need not be in select listNoMandatory
E121-03Value expressions in ORDER BY clauseNoMandatory
E121-04OPEN statementNoMandatory
E121-06Positioned UPDATE statementNoMandatory
E121-07Positioned DELETE statementNoMandatory
E121-08CLOSE statementNoMandatory
E121-10FETCH statement: implicit NEXTNoMandatory
E121-17WITH HOLD cursorsNoMandatory
E131Null value support (nulls in lieu of values)YesMandatory
E141Basic integrity constraintsPartialMandatoryDon’t support UNIQUE (VALUE) constraints, don’t support UNIQUE over a list of columns. Unique constraints not enforced.Don’t support referencing periods, MATCH, or triggered actions in foreign key.Don’t support CHECK constraints.
E141-01NOT NULL constraintsYesMandatory
E141-02UNIQUE constraints of NOT NULL columnsPartialMandatoryUNIQUE constraints not enforced
E141-03PRIMARY KEY constraintsPartialMandatoryPrimary keys not enforced
E141-04Basic FOREIGN KEY constraint with the NO ACTION default for both referential delete action and referential update action.PartialMandatoryDon’t support referencing periods, MATCH, or triggered actions. Foreign keys not enforced.
E141-06CHECK constraintsYesMandatory
E141-07Column defaultsYesMandatory
E141-08NOT NULL inferred on PRIMARY KEYNoMandatoryNo need to declare NOT NULL with PRIMARY KEY or UNIQUE, but non-nullness not enforced.
E141-10Names in a foreign key can be specified in any orderNoMandatory
E151Transaction supportNoMandatory
E151-01COMMIT statementNoMandatory
E151-02ROLLBACK statementNoMandatory
E152Basic SET TRANSACTION statementNoMandatory
E152-01SET TRANSACTION state- ment: ISOLATION LEVEL SERIALIZABLE clauseNoMandatory
E152-02SET TRANSACTION state- ment: READ ONLY and READ WRITE clausesNoMandatory
E153Updatable queries with subqueriesNoMandatory
E161SQL comments using leading double minusYesMandatory
E171SQLSTATE supportNoMandatory
F031Basic schema manipulationYesMandatory
F031-01CREATE TABLE statement to create persistent base tablesYesMandatory
F031-02CREATE VIEW statementYesMandatory
F031-03GRANT statementYesMandatory
F031-04ALTER TABLE statement: ADD COLUMN clauseYesMandatory
F031-13DROP TABLE statement: RESTRICT clauseYesMandatory
F031-16DROP VIEW statement: RESTRICT clauseYesMandatory
F031-19REVOKE statement: RESTRICT clauseNoMandatory
F032CASCADE drop behaviorYesOptional
F034Extended REVOKE statementYesOptional
F034-01REVOKE statement performed by other than the owner of a schema objectYesOptional
F034-02REVOKE statement: GRANT OPTION FOR clauseYesOptional
F034-03REVOKE statement to revoke a privilege that the grantee has WITH GRANT OPTIONYesOptional
F041Basic joined tableYesMandatory
F041-01Inner join (but not necessarily the INNER keyword)YesMandatory
F041-02INNER keywordYesMandatory
F041-03LEFT OUTER JOINYesMandatory
F041-04RIGHT OUTER JOINYesMandatory
F041-05Outer joins can be nestedYesMandatory
F041-07The inner table in a left or right outer join can also be used in an inner joinYesMandatory
F041-08All comparison operators are supported (rather than just =)YesMandatory
F051Basic date and timePartialMandatoryNo support for WITH/OUT TIMEZONE.No support for precision in TIMESTAMP.No support for TIME type.
F051-01DATE data type (including support of DATE literal)PartialMandatoryIntervals don’t match spec syntax
F051-02TIME data type (including support of TIME literal) with fractional seconds precision of at least 0.NoMandatory
F051-03TIMESTAMP data type (including support of TIMESTAMP literal) with fractional seconds precision of at least 0 and 6.PartialMandatoryNo support for WITH/OUT TIMEZONE.No support for precision.Intervals don’t match spec syntax.
F051-04Comparison predicate on DATE, TIME, and TIMESTAMP data typesPartialMandatoryNo support for TIME
F051-05Explicit CAST between date-time types and character string typesPartialMandatoryNo support for TIME
F051-06CURRENT_DATEYesMandatory
F051-07LOCALTIMENoMandatory
F051-08LOCALTIMESTAMPPartialMandatoryCURRENT_TIMESTAMP supported, doesn’t take precision argumentLOCALTIMESTAMP not supported 
F052Intervals and datetime arithmeticPartialOptionalInterval not supported as column type, only as expression type in queries.Interval syntax differs from standard.
F054TIMESTAMP in DATE type precedence listYesOptional
F081UNION and EXCEPT in viewsYesMandatory
F131Grouped operationsYesMandatory
F131-01WHERE, GROUP BY, and HAVING clauses supported in queries with grouped viewsYesMandatory
F131-02Multiple tables supported in queries with grouped viewsYesMandatory
F131-03Set functions supported in queries with grouped viewsYesMandatory
F131-04Subqueries with GROUP BY and HAVING clauses and grouped viewsYesMandatory
F131-05Single row SELECT with GROUP BY and HAVING clauses and grouped viewsYesMandatory
F171Multiple schemas per userYesOptional
F200TRUNCATE TABLE statementYesOptional
F181Multiple module supportNoMandatory
F201CAST functionYesMandatory
F221Explicit defaultsYesMandatory
F261CASE expressionYesMandatory
F261-01Simple CASEYesMandatory
F261-02Searched CASEYesMandatory
F261-03NULLIFYesMandatory
F261-04COALESCEYesMandatory
F302INTERSECT table operatorYesOptional
F302-01INTERSECT DISTINCT table operatorYesOptional
F302-02INTERSECT ALL table operatorYesOptional
F304EXCEPT ALL table operatorYesOptional
F311Schema definition statementYesMandatory
F311-01CREATE SCHEMAYesMandatory
F311-02CREATE TABLE for persistent base tablesPartialMandatoryDoes not create schema element creation as part of schema creation, must be done in separate statement
F311-03CREATE VIEWPartialMandatoryDoes not create schema element creation as part of schema creation, must be done in separate statement
F311-04CREATE VIEW: WITH CHECK OPTIONNoMandatory
F311-05GRANT statementPartialMandatoryDoes not create schema element creation as part of schema creation, must be done in separate statement
F312MERGE statementYesOptional
F313Enhanced MERGE statementYesOptional
F314MERGE statement with DELETE branchYesOptional
F321User authorizationPartialOptionalSupport for CURRENT_USER function, none of the rest
F381Extended schema manipulationPartialOptionalNo support for scope.No support for ALTER routine.
F381-01ALTER TABLE statement: ALTER COLUMN clausePartialOptionalSyntax non-standard.No support for scope.No support for identities.No support for column generation.
F381-02ALTER TABLE statement: ADD CONSTRAINT clausePartialOptionalSame limitations as creating constraints above
F381-03ALTER TABLE statement: DROP CONSTRAINT clausePartialOptionalSame limitations as creating constraints above
F382Alter column data typePartialOptionalSyntax non-standard
F383Set column not null clausePartialOptionalSyntax non-standard
F391Long identifiersYesOptional
F401Extended joined tablePartialOptionalNATURAL joins not supported
F401-02FULL OUTER JOINYesOptional
F401-04CROSS JOINYesOptional
F471Scalar subquery valuesYesMandatory
F481Expanded NULL predicateYesMandatory
F531Temporary tabelsPartialOptionalGLOBAL/LOCAL scope not supported.DECLARE TEMPORARY TABLE not supported.
F555Enhanced seconds precisionYesOptional
F763CURRENT_SCHEMAPartialOptionalCURRENT_DATABASE, which is equivalent
F812Basic flaggingNoMandatory
F841LIKE_REGEX predicatePartialOptionaluse RLIKE instead
F847Nonconstant regular expressionsYesOptional
F850Top level inYesOptional
F851in subqueriesYesOptional
F852Top-level in viewsYesOptional
F855Nested inYesOptional
F856Nested inYesOptional
F857Top-level inYesOptional
F858in subqueriesYesOptional
F859Top-level in viewsYesOptional
S011Distinct data typesNoMandatory
S091Basic array supportPartialOptionalSyntax non-standard.No option to declare max cardinality.SIZE instead of CARDINALITY.
S091-01Arrays of built-in data typesPartialOptionalSyntax non-standard
S091-03Array expressionsPartialOptionalSupport array element reference and cardinality (though syntax non-standard)No support for array concatenation, trimming, or max-cardinality
T021BINARY and VARBINARY typesPartialOptionalBINARY only, though it acts like VARBINARY, no length parameter accepted.No support for overlay, trim, position, or LIKE.
T031BOOLEAN data typeYesOptional
T041Basic LOB data type supportPartialOptionalBINARY acts as BLOB (no size restrictions)STRING acts as CLOBNon-standard syntax
T041-01BLOB data typePartialOptionalBINARY acts as BLOB, non-standard syntax
T041-02CLOB data typePartialOptionalSTRING acts as CLOB, non-standard syntax
T041-03POSITION, LENGTH, LOWER, TRIM, UPPER, SUBSTRING for LOB data typesPartialOptionalNo POSITIONLOWER, UPPER only applicable to STRING
T041-04Concatenation of LOB typesYesOptional
T042Extended LOB data type supportPartialOptionalCast for BINARY and STRING supported.LIKE for STRING supported.All other advanced options not supported.
T051Row typesPartialOptionalCalled STRUCT rather than ROW
T071BIGINT data typeYesOptional
T121WITH (excluding RECURSIVE) in query expressionYesOptional
T321Basic SQL-invoked routinesNoMandatory
T321-01User-defined functions with no overloadingNoMandatory
T321-02User-defined stored procedures with no overloadingNoMandatory
T321-03Function invocationNoMandatory
T321-04CALL statementNoMandatory
T321-05RETURN statementNoMandatory
T331Basic rolesYesOptional
T351Bracketed commentsYesOptional
T431Extended grouping capabilitiesYesOptional
T433Multiargument GROUPING functionYesOptional
T441ABS and MOD functionsYesOptional
T501Enhanced EXISTS predicateYesOptional
T581Regular expression substring functionYesOptional
T591UNIQUE constraints of possibly null columnsYesOptional
T611Elementary OLAP operationsYesOptional
T612Advanced OLAP operationsPartialOptional
T613SamplingYesOptional
T614NTILE functionYesOptional
T615LEAD and LAG functionsYesOptional
T617FIRST_VALUE and LAST_VALUE functionsYesOptional
T621Enhanced numeric functionsYesOptional
T622Trigonometric functionsPartialOptionalNo sinh, cosh, tanh
T623General logarithm functionsYesOptional
T624Common logarithm functionsYesOptional
T631IN predicate with one list elementYesMandatory

Apache Hive : Supported Features: Apache Hive 2.1

IdentifierDescriptionHive 2.1Comment
E011Numeric data typesYes
E011-01INTEGER and SMALLINT data types (including all spellings)YesInt instead of Integer
E011-02REAL, DOUBLE PRECISON,and FLOAT data typesYesDouble instead of Double Precision
E011-03DECIMAL and NUMERIC data typesYes
E011-04Arithmetic operatorsYes
E011-05Numeric comparisonYes
E011-06Implicit casting among the numeric data typesYes
E021Character data typesYes
E021-01CHARACTER data typeYesChar instead of Character
E021-02CHARACTER VARYING data typeYesVarchar instead of Character Varying
E021-03Character literalsYes
E021-04CHARACTER_LENGTH functionPartiallength UDF provided
E021-06SUBSTRING functionYes
E021-07Character concatenationYesconcat UDF instead of standard
E021-08UPPER and LOWER functionsYes
E021-09TRIM functionPartialleading / trailing / both from not supported
E021-10Implicit casting among the fixed-length and variablelength character string typesYes
E021-12Character comparisonYes
E031IdentifiersYes
E031-01Delimited identifiersPartialBacktick (`) used instead of ("). Semicolon character (;) cannot be used in an identifier. Table and column names have additional restrictions
E031-03Trailing underscoreYes
E051Basic query specificationYes
E051-01SELECT DISTINCTYes
E051-02GROUP BY clausePartialEmpty grouping sets not supported
E051-04GROUP BY can contain columns not inYes
E051-05Select list items can be renamedYes
E051-06HAVING clauseYes
E051-07Qualified * in select listYes
E051-08Correlation names in the FROM clauseYes
E061Basic predicates and search conditionsYes
E061-01Comparison predicateYes
E061-02BETWEEN predicateYes
E061-03IN predicate with list of valuesYes
E061-04LIKE predicateYes
E061-06NULL predicateYes
E061-08EXISTS predicateYes
E061-11Subqueries in IN predicateYes
E061-13Correlated subqueriesPartialOnly correlated subqueries that can be decorrelated with rewrite rules supported
E071Basic query expressionsYes
E071-01UNION DISTINCT table operatorPartialCorresponding By syntax not supported
E071-02UNION ALL table operatorPartialCorresponding By syntax not supported
E071-05Columns combined via table operators need not have exactly the same data type.Yes
E071-06Table operators in subqueriesYes
E081Basic PrivilegesYes
E081-01SELECT privilegeYes
E081-03INSERT privilege at the table levelYes
E081-04UPDATE privilege at the table levelYes
E081-08WITH GRANT OPTIONYes
E091Set FunctionsYes
E091-01AVGYes
E091-02COUNTYes
E091-03MAXYes
E091-04MINYes
E091-05SUMYes
E091-07DISTINCT quantifierYes
E101Basic data manipulationYes
E101-01INSERT statementYes
E101-03Searched UPDATE statementYes
E101-04Searched DELETE statementYes
E131Null value support (nulls in lieu of values)PartialNull specification is supported
E141Basic integrity constraintsYes
E141-03PRIMARY KEY constraintsPartialNon-validated
E141-04Basic FOREIGN KEY constraint with the NO ACTION default for both referential delete action and referential update actionPartialNon-validated
E141-10Names in a foreign key can be specified in any orderYes
E151Transaction supportPartialAutocommit transaction for INSERT/UPDATE/DELETE
E161SQL comments using leading double minusYes
F031Basic schema manipulationYes
F031-01CREATE TABLE statement to create persistent base tablesYes
F031-02CREATE VIEW statementYes
F031-03GRANT statementYes
F031-04ALTER TABLE statement: ADD COLUMN clauseYes
F031-13DROP TABLE statement: RESTRICT clauseYes
F031-16DROP VIEW statement: RESTRICT clauseYes
F041Basic joined tableYes
F041-01Inner join (but not necessarily the INNER keyword)YesNamed columns join not supported
F041-02INNER keywordYes
F041-03LEFT OUTER JOINYes
F041-04RIGHT OUTER JOINYes
F041-05Outer joins can be nestedYes
F041-07The inner table in a left or right outer join can also be used in an inner joinYes
F051Basic date and timeYes
F051-01DATE data type (including support of DATE literal)Yes
F051-03TIMESTAMP data type (including support of TIMES- TAMP literal) with fractional seconds precision of at least 0 and 6.Yes
F051-04Comparison predicate on DATE, TIME, and TIMES- TAMP data typesYes
F051-05Explicit CAST between date- time types and character string typesYes
F051-06CURRENT_DATEYes
F052Intervals and datetime arithmeticYes
F081UNION and EXCEPT in viewsPartialUNION only
F131Grouped operationsYes
F131-01WHERE, GROUP BY, and HAVING clauses supported in queries with grouped viewsYes
F131-02Multiple tables supported in queries with grouped viewsYes
F131-03Set functions supported in queries with grouped viewsYes
F131-04Subqueries with GROUP BY and HAVING clauses and grouped viewsYes
F171Multiple schemas per userYes
F200TRUNCATE TABLE statementYes
F201CAST functionYes
F261CASE expressionYes
F261-01Simple CASEYes
F261-02Searched CASEYes
F261-04COALESCEYes
F311-01CREATE SCHEMAYes
F311-02CREATE TABLE for persistent base tablesYes
F311-03CREATE VIEWYes
F311-05GRANT statementYes
F382Alter column data typeYesUses nonstandard syntax
F391Long identifiersYes
F401Extended joined tableYes
F401-01NATURAL JOINYes
F401-02FULL OUTER JOINYes
F401-04CROSS JOINYes
F403Partitioned join tablesYes
F531Temporary tablesYes
F555Enhanced seconds precisionYes
F561Full value expressionsYes
F591Derived tablesYes
F641Row and table constructorsYes
F651Catalog name qualifiersYes
F846Octet support in regular expression operatorsYes
F847Nonconstant regular expressionsYes
F850Top-level inYes
F851in subqueriesYes
F852Top-level in viewsYes
F855Nested inYes
S023Basic structured typesYes
S091Basic array supportYes
S091-01Arrays of built-in data typesYes
S091-02Arrays of distinct typesYes
S098ARRAY_AGGPartialcollect_list does the same
S201-01Array parametersYes
S281Nested collection typesYes
T021BINARY and VARBINARY data typesPartialBINARY only
T031BOOLEAN data typeYes
T051Row typesYes
T071BIGINT data typeYes
T121WITH (excluding RECURSIVE) in query expressionYes
T122WITH (excluding RECURSIVE) in subqueryYes
T172AS subquery clause in table definitionYes
T326Table functionsYes
T331Basic rolesYes
T431Extended grouping capabilitiesPartialGrouping sets need to be extracted manually from a bitmask
T433Multiargument GROUPING functionYes
T441ABS and MOD functionsPartialABS provided, MOD provided via % operator
T501Enhanced EXISTS predicateYes
T581Regular expression substring functionYes
T611Elementary OLAP operationsYes
T612Advanced OLAP operationsPartialPERCENT_RANK, CUME_DIST and ROW_NUMBER supported
T613SamplingYesNonstandard syntax via TABLESAMPLE
T614NTILE functionYes
T615LEAD and LAG functionsYes
T616Null treatment option for LEAD and LAG functionsYes
T617FIRST_VALUE and LAST_VALUE functionsYes
T621Enhanced numeric functionsYes
T631IN predicate with one list elementYes

Apache Hive : Supported Features: Apache Hive 2.3

IdentifierDescriptionHive 2.3Comment
E011Numeric data typesYes
E011-01INTEGER and SMALLINT data types (including all spellings)Yes
E011-02REAL, DOUBLE PRECISON,and FLOAT data typesYes
E011-03DECIMAL and NUMERIC data typesYes
E011-04Arithmetic operatorsYes
E011-05Numeric comparisonYes
E011-06Implicit casting among the numeric data typesYes
E021Character data typesYes
E021-01CHARACTER data typeYesChar instead of Character
E021-02CHARACTER VARYING data typeYesVarchar instead of Character Varying
E021-03Character literalsYes
E021-04CHARACTER_LENGTH functionYes
E021-05OCTET_LENGTH functionYes
E021-06SUBSTRING functionYes
E021-07Character concatenationYes
E021-08UPPER and LOWER functionsYes
E021-09TRIM functionPartialleading / trailing / both from not supported
E021-10Implicit casting among the fixed-length and variablelength character string typesYes
E021-12Character comparisonYes
E031IdentifiersYes
E031-01Delimited identifiersYes
E031-03Trailing underscoreYes
E051Basic query specificationYes
E051-01SELECT DISTINCTYes
E051-02GROUP BY clausePartialEmpty grouping sets not supported
E051-04GROUP BY can contain columns not inYes
E051-05Select list items can be renamedYes
E051-06HAVING clauseYes
E051-07Qualified * in select listYes
E051-08Correlation names in the FROM clauseYes
E061Basic predicates and search conditionsYes
E061-01Comparison predicateYes
E061-02BETWEEN predicateYes
E061-03IN predicate with list of valuesYes
E061-04LIKE predicateYes
E061-06NULL predicateYes
E061-08EXISTS predicateYes
E061-09Subqueries in comparison predicateYes
E061-11Subqueries in IN predicateYes
E061-13Correlated subqueriesYes
E071Basic query expressionsYes
E071-01UNION DISTINCT table operatorYes
E071-02UNION ALL table operatorYes
E071-03EXCEPT DISTINCT table operatorYes
E071-05Columns combined via table operators need not have exactly the same data type.Yes
E071-06Table operators in subqueriesYes
E081Basic PrivilegesYes
E081-01SELECT privilegeYes
E081-03INSERT privilege at the table levelYes
E081-04UPDATE privilege at the table levelYes
E081-08WITH GRANT OPTIONYes
E091Set FunctionsYes
E091-01AVGYes
E091-02COUNTYes
E091-03MAXYes
E091-04MINYes
E091-05SUMYes
E091-06ALL quantifierYes
E091-07DISTINCT quantifierYes
E101Basic data manipulationYes
E101-01INSERT statementYes
E101-03Searched UPDATE statementYes
E101-04Searched DELETE statementYes
E131Null value support (nulls in lieu of values)Yes
E141Basic integrity constraintsYes
E141-03PRIMARY KEY constraintsPartialNon-enforced
E141-04Basic FOREIGN KEY constraint with the NO ACTION default for both referential delete action and referential update actionPartialNon-enforced
E141-08NOT NULL inferred on PRIMARY KEYPartialInferred on read
E141-10Names in a foreign key can be specified in any orderYes
E151Transaction supportPartialAutocommit transaction for INSERT/UPDATE/DELETE/MERGE
E161SQL comments using leading double minusYes
F031Basic schema manipulationYes
F031-01CREATE TABLE statement to create persistent base tablesYes
F031-02CREATE VIEW statementYes
F031-03GRANT statementYes
F031-04ALTER TABLE statement: ADD COLUMN clauseYes
F031-13DROP TABLE statement: RESTRICT clauseYes
F031-16DROP VIEW statement: RESTRICT clauseYes
F041Basic joined tableYes
F041-01Inner join (but not necessarily the INNER keyword)Yes
F041-02INNER keywordYes
F041-03LEFT OUTER JOINYes
F041-04RIGHT OUTER JOINYes
F041-05Outer joins can be nestedYes
F041-07The inner table in a left or right outer join can also be used in an inner joinYes
F041-08All comparison operators are supported (rather than just =)Yes
F051Basic date and timeYes
F051-01DATE data type (including support of DATE literal)Yes
F051-03TIMESTAMP data type (including support of TIMES- TAMP literal) with fractional seconds precision of at least 0 and 6.Yes
F051-04Comparison predicate on DATE, TIME, and TIMES- TAMP data typesYes
F051-05Explicit CAST between date- time types and character string typesYes
F051-06CURRENT_DATEYes
F052Intervals and datetime arithmeticYes
F054TIMESTAMP in DATE type precedence listYes
F081UNION and EXCEPT in viewsYes
F131Grouped operationsYes
F131-01WHERE, GROUP BY, and HAVING clauses supported in queries with grouped viewsYes
F131-02Multiple tables supported in queries with grouped viewsYes
F131-03Set functions supported in queries with grouped viewsYes
F131-04Subqueries with GROUP BY and HAVING clauses and grouped viewsYes
F171Multiple schemas per userYes
F200TRUNCATE TABLE statementYes
F201CAST functionYes
F261CASE expressionYes
F261-01Simple CASEYes
F261-02Searched CASEYes
F261-03NULLIFYes
F261-04COALESCEYes
F271Compound character literalsYes
F281LIKE enhancementsPartialEscape characters not supported
F302INTERSECT table operatorYes
F302-01INTERSECT DISTINCT table operatorYes
F302-02INTERSECT ALL table operatorYes
F304EXCEPT ALL table operatorYes
F311-01CREATE SCHEMAPartialDatabase is used as the equivalent
F311-02CREATE TABLE for persistent base tablesYes
F311-03CREATE VIEWYes
F311-05GRANT statementYes
F312MERGE statementYes
F314MERGE statement with DELETE branchYes
F382Alter column data typeYesUses nonstandard syntax
F391Long identifiersYes
F401Extended joined tableYes
F401-01NATURAL JOINYes
F401-02FULL OUTER JOINYes
F401-04CROSS JOINYes
F403Partitioned join tablesYes
F531Temporary tablesYes
F555Enhanced seconds precisionYes
F561Full value expressionsYes
F591Derived tablesYes
F641Row and table constructorsYes
F651Catalog name qualifiersYes
F846Octet support in regular expression operatorsYes
F847Nonconstant regular expressionsYes
F850Top-level inYes
F851in subqueriesYes
F852Top-level in viewsYes
F855Nested inYes
S023Basic structured typesYes
S091Basic array supportYes
S091-01Arrays of built-in data typesYes
S091-02Arrays of distinct typesYes
S098ARRAY_AGGPartialcollect_list provides similar functionality
S201-01Array parametersYes
S281Nested collection typesYes
S301Enhanced UNNESTPartialLATERAL JOIN provides similar functionality
T021BINARY and VARBINARY data typesPartialBINARY only
T031BOOLEAN data typeYes
T051Row typesYes
T071BIGINT data typeYes
T121WITH (excluding RECURSIVE) in query expressionYes
T122WITH (excluding RECURSIVE) in subqueryYes
T171LIKE clause in table definitionYes
T172AS subquery clause in table definitionYes
T281SELECT privilege with column granularityPartialProvided by ecosystem projects like Apache Ranger and Apache Sentry
T326Table functionsYes
T331Basic rolesYes
T351Bracketed commentsYes
T431Extended grouping capabilitiesPartialConcatenated grouping sets unsupported
T433Multiargument GROUPING functionYes
T441ABS and MOD functionsYes
T501Enhanced EXISTS predicateYes
T551Optional key words for default syntaxYes
T581Regular expression substring functionYes
T611Elementary OLAP operationsYes
T612Advanced OLAP operationsPartialPERCENT_RANK, CUME_DIST and ROW_NUMBER supported
T613SamplingYesNonstandard syntax via TABLESAMPLE
T614NTILE functionYes
T615LEAD and LAG functionsYes
T616Null treatment option for LEAD and LAG functionsYes
T617FIRST_VALUE and LAST_VALUE functionsYes
T621Enhanced numeric functionsYes
T631IN predicate with one list elementYes