Administration Manual
Apache Hive : Iceberg REST Catalog API backed by Hive Metastore
Introduction

Hive Metastore offers Iceberg REST API endpoints for clients native to Apache Iceberg. Consequently, Iceberg users can access Iceberg tables via either Hive Metastore Thrift API (using HiveCatalog) or Iceberg REST Catalog API.
Basic configurations
You must configure the following parameters.
| Key | Required? | Default | Value |
|---|---|---|---|
| metastore.catalog.servlet.port | Yes | -1 | The port number to which Iceberg REST API listens |
Authentication
Hive Metastore’s Iceberg REST API supports four authentication methods.
Apache Hive : Setting up Metastore backed by MariaDB
Note
Starting from mysql-connector-java 8.0.12, using the default MySQL driver the Metastore cannot be up to service.
Introduction
From mysql-connector-java 8.0.12, MySQL driver issues a getSQLKeywords call for retrieving this database’s keywords, triggered by MySQLAdapter(DataNucleus) on Metastore initialization.
However, the back table KEYWORDS in MariaDB diverged from that in MySQL, which makes the Metastore fail to start, an exception thrown like:
Apache Hive : Hive Schema Tool
Oct 14, 2025Apache Hive : Hive Schema Tool
About
Schema tool helps to initialise and upgrade metastore database and hive sys schema.
Metastore Schema Verification
Introduced in Hive 0.12.0. See HIVE-3764.
Hive records the schema version in the metastore database and verifies that the metastore schema version is compatible with Hive binaries that are going to access the metastore. Note that the Hive properties to implicitly create or alter the existing schema are disabled by default. Hive will not attempt to change the metastore schema implicitly. When you execute a Hive query against an old schema, it will fail to access the metastore:
Apache Hive : Setting Up OAuth 2
Sep 30, 2025Apache Hive : Setting Up OAuth 2
Hive is able to protect some resources and extract authenticated usernames with OAuth 2.
Supported Features
- Iceberg REST Catalog API
Examples: Integration with external Authorization Servers
Apache Hive : AdminManual
Dec 12, 2024Apache Hive : AdminManual
Hive Administrator’s Manual
Apache Hive : AdminManual Configuration
Dec 12, 2024Apache Hive : AdminManual Configuration
Configuring Hive
A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference:
- Using the set command in the CLI or Beeline for setting session level values for the configuration variable for all statements subsequent to the set command. For example, the following command sets the scratch directory (which is used by Hive to store temporary output and plans) to
/tmp/mydirfor all subsequent statements:
set hive.exec.scratchdir=/tmp/mydir;
- Using the
--hiveconfoption of the[hive](#hive)command (in the CLI) or[beeline](#beeline)command for the entire session. For example:
bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir
- In
hive-site.xml. This is used for setting values for the entire Hive configuration (see hive-site.xml and hive-default.xml.template below). For example:
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
<description>Scratch space for Hive jobs</description>
</property>
- In server-specific configuration files (supported starting Hive 0.14). You can set metastore-specific configuration values in hivemetastore-site.xml, and HiveServer2-specific configuration values in hiveserver2-site.xml.
The server-specific configuration file is useful in two situations:
- You want a different configuration for one type of server (for example – enabling authorization only in HiveServer2 and not CLI).
- You want to set a configuration value only in a server-specific configuration file (for example – setting the metastore database password only in the metastore server configuration file).
HiveMetastore server reads hive-site.xml as well as hivemetastore-site.xml configuration files that are available in the $HIVE_CONF_DIR or in the classpath. If the metastore is being used in embedded mode (i.e., hive.metastore.uris is not set or empty) inhivecommandline or HiveServer2, the hivemetastore-site.xml gets loaded by the parent process as well.
The value of hive.metastore.uris is examined to determine this, and the value should be set appropriately in hive-site.xml .
Certain metastore configuration parameters like hive.metastore.sasl.enabled, hive.metastore.kerberos.principal, hive.metastore.execute.setugi, and hive.metastore.thrift.framed.transport.enabled are used by the metastore client as well as server. For such common parameters it is better to set the values in hive-site.xml, that will help in keeping them consistent.
HiveServer2 reads hive-site.xml as well as hiveserver2-site.xml that are available in the $HIVE_CONF_DIR or in the classpath.
If HiveServer2 is using the metastore in embedded mode, hivemetastore-site.xml also is loaded.
Apache Hive : AdminManual Installation
Dec 12, 2024Apache Hive : AdminManual Installation
Installing Hive
You can install a stable release of Hive by downloading and unpacking a tarball, or you can download the source code and build Hive using Maven (release 0.13 and later) or Ant (release 0.12 and earlier).
Hive installation has these requirements:
- Java 1.7 (preferred).
Note: Hive versions 1.2 onward require Java 1.7 or newer. Hive versions 0.14 to 1.1 work with Java 1.6, but prefer 1.7. Users are strongly advised to start moving to Java 1.8 (see HIVE-8607). - Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward).
Hive versions up to 0.13 also supported Hadoop 0.20.x, 0.23.x. - Hive is commonly used in production Linux and Windows environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac. Using it on Windows would require slightly different steps.
Installing from a Tarball
Start by downloading the most recent stable release of Hive from one of the Apache download mirrors (see Hive Releases).
Apache Hive : AdminManual Metastore 3.0 Administration
Version Note
This document applies only to the Metastore in Hive 3.0 and later releases. For Hive 0, 1, and 2 releases please see the Metastore Administration document.
Introduction
The definition of Hive objects such as databases, tables, and functions are stored in the Metastore. Depending on how the system is configured, statistics and authorization records may also be stored there. Hive, and other execution engines, use this data at runtime to determine how to parse, authorize, and efficiently execute user queries.
Apache Hive : AdminManual Metastore Administration
Dec 12, 2024Apache Hive : AdminManual Metastore Administration
This page only documents the MetaStore in Hive 2.x and earlier. For 3.x and later releases please see AdminManual Metastore 3.0 Administration
Introduction
All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using JPOX ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of supported databases in section below.
Apache Hive : AdminManual SettingUpHiveServer
Dec 12, 2024Apache Hive : AdminManual SettingUpHiveServer
Setting Up Hive Server
Apache Hive : Hive MetaTool
Dec 12, 2024Apache Hive : Hive MetaTool
Introduced in Hive 0.10.0. See HIVE-3056 and HIVE-3443.
The Hive MetaTool enables administrators to do bulk updates on the location fields in database, table, and partition records in the metastore. It provides the following functionality:
- Ability to search and replace the HDFS NN (NameNode) location in metastore records that reference the NN. One use is to transition a Hive deployment to HDFS HA NN (HDFS High Availability NameNode).
- A command line tool to execute JDOQL against the metastore. The ability to execute JDOQL against the metastore can be a useful debugging tool for both users and Hive developers.
The metatool Command
The metatool command invokes the Hive MetaTool with these options:
Apache Hive : Hive on Spark: Getting Started
Dec 12, 2024Apache Hive : Hive on Spark: Getting Started
Hive on Spark provides Hive with the ability to utilize Apache Spark as its execution engine.
set hive.execution.engine=spark;
Hive on Spark was added in HIVE-7292.
Version Compatibility
Hive on Spark is only tested with a specific version of Spark, so a given version of Hive is only guaranteed to work with a specific version of Spark. Other versions of Spark may work with a given version of Hive, but that is not guaranteed. Below is a list of Hive versions and their corresponding compatible Spark versions.
Apache Hive : HiveAmazonElasticMapReduce
Dec 12, 2024Apache Hive : HiveAmazonElasticMapReduce
Amazon Elastic MapReduce and Hive
Amazon Elastic MapReduce is a web service that makes it easy to launch managed, resizable Hadoop clusters on the web-scale infrastructure of Amazon Web Services (AWS). Elastic Map Reduce makes it easy for you to launch a Hive and Hadoop cluster, provides you with flexibility to choose different cluster sizes, and allows you to tear them down automatically when processing has completed. You pay only for the resources that you use with no minimums or long-term commitments.
Apache Hive : HiveAws
Dec 12, 2024Apache Hive : HiveAws
= Hive and Amazon Web Services =
Background
This document explores the different ways of leveraging Hive on Amazon Web Services - namely S3, EC2 and Elastic Map-Reduce.
Hadoop already has a long tradition of being run on EC2 and S3. These are well documented in the links below which are a must read:
The second document also has pointers on how to get started using EC2 and S3. For people who are new to S3 - there’s a few helpful notes in S3 for n00bs section below. The rest of the documentation below assumes that the reader can launch a hadoop cluster in EC2, copy files into and out of S3 and run some simple Hadoop jobs.
Apache Hive : HiveDerbyServerMode
Dec 12, 2024Apache Hive : Using Derby in Server Mode
Hive in embedded mode has a limitation of one active user at a time. You may want to run Derby as a Network Server, this way multiple users can access it simultaneously from different systems.
See Metadata Store and Embedded Metastore for more information.
Download Derby
It is suggested you download the version of Derby that ships with Hive. If you have already run Hive in embedded mode, the first line of derby.log contains the version.
Apache Hive : HiveJDBCInterface
Dec 12, 2024Apache Hive : JDBC Driver
The current JDBC interface for Hive only supports running queries and fetching results. Only a small subset of the metadata calls are supported.
To see how the JDBC interface can be used, see sample code.
Integration with Pentaho
- Download pentaho report designer from the pentaho website.
- Overwrite report-designer.sh with the code provided below.
#!/bin/sh
HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
CLASSPATH=$CLASSPATH:launcher.jar
echo java -XX:MaxPermSize=512m -cp $CLASSPATH -jar launcher.jar
java -XX:MaxPermSize=512m -cp $CLASSPATH org.pentaho.commons.launcher.Launcher
- Build and start the hive server with instructions from HiveServer.
- Compile and run the Hive JDBC client code to load some data (I haven’t figured out how to do this in report designer yet). See sample code for loading the data.
- Run the report designer (note step 2).
$ sh reporter-designer.sh
- Select ‘Report Design Wizard’.
- Select a template - say ‘fall template’ - next.
- Create a new data source - JDBC (custom), Generic database.
- Provide Hive JDBC parameters. Give the connection a name ‘hive’.
URL: jdbc:hive://localhost:10000/default
Driver name: org.apache.hadoop.hive.jdbc.HiveDriver
Username and password are empty
- Click on ‘Test’. The test should succeed.
- Edit the query: select ‘Sample Query’, click edit query, click on the connection ‘hive’. Create a new query. Write a query on the table testHiveDriverTable, for example, select * from testHiveDriverTable. Click next.
- Layout Step: Add
PageOfPagesto Group Items By. Add key and value as Selected Items. Click next. And Finish. - Change the Report header to ‘hive-pentaho-report’. Change the type of the header to ‘html’.
- Run the report and generate pdf. You should get something like the report attached here.
Integration with SQuirrel SQL Client
Download, install and start the SQuirrel SQL Client from the SQuirrel SQL website.
Apache Hive : HiveODBC
Dec 12, 2024Apache Hive : ODBC Driver
These instructions are for the Hive ODBC driver available in Hive for HiveServer1.
There is no ODBC driver available for HiveServer2 as part of Apache Hive. There are third party ODBC drivers available from different vendors, and most of them seem to be free.
Introduction
The Hive ODBC Driver is a software library that implements the Open Database Connectivity (ODBC) API standard for the Hive database management system, enabling ODBC compliant applications to interact seamlessly (ideally) with Hive through a standard interface. This driver will NOT be built as a part of the typical Hive build process and will need to be compiled and built separately according to the instructions below.
Apache Hive : HiveServer
Dec 12, 2024Apache Hive : HiveServer
Thrift Hive Server
HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache ThriftTM (http://thrift.apache.org/), therefore it is sometimes called the Thrift server although this can lead to confusion because a newer service named HiveServer2 is also built on Thrift. Since the introduction of HiveServer2, HiveServer has also been called HiveServer1.
Apache Hive : Manual Installation
Dec 12, 2024Apache Hive : Manual Installation
Installing, configuring and running Hive
You can install a stable release of Hive by downloading and unpacking a tarball, or you can download the source code and build Hive using Maven (release 3.6.3 and later).
Prerequisites
- Java 8.
- Maven 3.6.3
- Protobuf 2.5
- Hadoop 3.3.6 (As a preparation, configure it in single-node cluster, pseudo-distributed mode)
- Tez. The default is MapReduce but we will change the execution engine to Tez.
- Hive is commonly used in production Linux environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac.
Install the prerequisites
Java 8
Building Hive requires JDK 8 installed. Some notes in case you have ARM chipset (Apple M1 or later).
Apache Hive : Replication
Dec 12, 2024Apache Hive : Replication
Overview
Hive Replication builds on the metastore event and ExIm features to provide a framework for replicating Hive metadata and data changes between clusters. There is no requirement for the source cluster and replica to run the same Hadoop distribution, Hive version, or metastore RDBMS. The replication system has a fairly ’light touch’, exhibiting a low degree of coupling and using the Hive-metastore Thrift service as an integration point. However, the current implementation is not an ‘out of the box’ solution. In particular it is necessary to provide some kind of orchestration service that is responsible for requesting replication tasks and executing them.
Apache Hive : Setting Up Hive with Docker
Dec 12, 2024Apache Hive : Setting Up Hive with Docker
Introduction
Run Apache Hive inside docker container in pseudo-distributed mode
STEP 1: Pull the image
- Pull the 4.0.0 image from Hive DockerHub
docker pull apache/hive:4.0.0
STEP 2: Export the Hive version
export HIVE_VERSION=4.0.0
STEP 3: Launch the HiveServer2 with an embedded Metastore.
This is lightweight and for a quick setup, it uses Derby as metastore db.
docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION}
STEP 4: Connect to beeline
docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/'
Note: Launch Standalone Metastore To use standalone Metastore with Derby,
Apache Hive : Setting Up HiveServer2
Dec 12, 2024Apache Hive : Setting Up HiveServer2
HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC.
- The Thrift interface definition language (IDL) for HiveServer2 is available at https://github.com/apache/hive/blob/trunk/service/if/TCLIService.thrift.
- Thrift documentation is available at http://thrift.apache.org/docs/.
This document describes how to set up the server. How to use a client with this server is described in the HiveServer2 Clients document.
Apache Hive : User and Group Filter Support with LDAP Atn Provider in HiveServer2
User and Group Filter Support with LDAP
Starting in Hive 1.3.0, HIVE-7193 adds support in HiveServer2 for
- LDAP Group filters
- LDAP User filters
- Custom LDAP Query support.
Filters greatly enhance the functionality of the LDAP Authentication provider. They allow Hive to restrict the set of users allowed to connect to HiveServer2.
See Authentication/Security Configuration for general information about configuring authentication for HiveServer2. Also see Hive Configuration Properties – HiveServer2 for individual configuration parameters discussed below.