Administration Manual

23 documents

Apache Hive : Iceberg REST Catalog API backed by Hive Metastore

Nov 14, 2025

Apache Hive : Iceberg REST Catalog API backed by Hive Metastore

Introduction

Hive Metastore offers Iceberg REST API endpoints for clients native to Apache Iceberg. Consequently, Iceberg users can access Iceberg tables via either Hive Metastore Thrift API (using HiveCatalog) or Iceberg REST Catalog API.

Basic configurations

You must configure the following parameters.

Key	Required?	Default	Value
metastore.catalog.servlet.port	Yes	-1	The port number to which Iceberg REST API listens

Authentication

Hive Metastore’s Iceberg REST API supports four authentication methods.

Apache Hive : Setting up Metastore backed by MariaDB

Nov 5, 2025

Apache Hive : Setting up Metastore backed by MariaDB

Note

Starting from mysql-connector-java 8.0.12, using the default MySQL driver the Metastore cannot be up to service.

Introduction

From mysql-connector-java 8.0.12, MySQL driver issues a getSQLKeywords call for retrieving this database’s keywords, triggered by MySQLAdapter(DataNucleus) on Metastore initialization. However, the back table KEYWORDS in MariaDB diverged from that in MySQL, which makes the Metastore fail to start, an exception thrown like:

Apache Hive : Hive Schema Tool

Oct 14, 2025

Apache Hive : Hive Schema Tool

About

Schema tool helps to initialise and upgrade metastore database and hive sys schema.

Metastore Schema Verification

Introduced in Hive 0.12.0. See HIVE-3764.

Hive records the schema version in the metastore database and verifies that the metastore schema version is compatible with Hive binaries that are going to access the metastore. Note that the Hive properties to implicitly create or alter the existing schema are disabled by default. Hive will not attempt to change the metastore schema implicitly. When you execute a Hive query against an old schema, it will fail to access the metastore:

Apache Hive : Setting Up OAuth 2

Sep 30, 2025

Apache Hive : Setting Up OAuth 2

Hive is able to protect some resources and extract authenticated usernames with OAuth 2.

Supported Features

Iceberg REST Catalog API

Examples: Integration with external Authorization Servers

Keycloak

Apache Hive : AdminManual

Dec 12, 2024

Apache Hive : AdminManual

Hive Administrator’s Manual

Apache Hive : AdminManual Configuration

Dec 12, 2024

Apache Hive : AdminManual Configuration

Configuring Hive

A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference:

Using the set command in the CLI or Beeline for setting session level values for the configuration variable for all statements subsequent to the set command. For example, the following command sets the scratch directory (which is used by Hive to store temporary output and plans) to /tmp/mydir for all subsequent statements:

  set hive.exec.scratchdir=/tmp/mydir;

Using the --hiveconf option of the [hive](#hive) command (in the CLI) or [beeline](#beeline) command for the entire session. For example:

  bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir

In hive-site.xml. This is used for setting values for the entire Hive configuration (see hive-site.xml and hive-default.xml.template below). For example:

  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/mydir</value>
    <description>Scratch space for Hive jobs</description>
  </property>

In server-specific configuration files (supported starting Hive 0.14). You can set metastore-specific configuration values in hivemetastore-site.xml, and HiveServer2-specific configuration values in hiveserver2-site.xml.
The server-specific configuration file is useful in two situations:

1. You want a different configuration for one type of server (for example – enabling authorization only in HiveServer2 and not CLI).
2. You want to set a configuration value only in a server-specific configuration file (for example – setting the metastore database password only in the metastore server configuration file).
  HiveMetastore server reads hive-site.xml as well as hivemetastore-site.xml configuration files that are available in the $HIVE_CONF_DIR or in the classpath. If the metastore is being used in embedded mode (i.e., hive.metastore.uris is not set or empty) in hive commandline or HiveServer2, the hivemetastore-site.xml gets loaded by the parent process as well.
  The value of hive.metastore.uris is examined to determine this, and the value should be set appropriately in hive-site.xml .
  Certain metastore configuration parameters like hive.metastore.sasl.enabled, hive.metastore.kerberos.principal, hive.metastore.execute.setugi, and hive.metastore.thrift.framed.transport.enabled are used by the metastore client as well as server. For such common parameters it is better to set the values in hive-site.xml, that will help in keeping them consistent.
HiveServer2 reads hive-site.xml as well as hiveserver2-site.xml that are available in the $HIVE_CONF_DIR or in the classpath.
If HiveServer2 is using the metastore in embedded mode, hivemetastore-site.xml also is loaded.

Apache Hive : AdminManual Installation

Dec 12, 2024

Apache Hive : AdminManual Installation

Installing Hive

You can install a stable release of Hive by downloading and unpacking a tarball, or you can download the source code and build Hive using Maven (release 0.13 and later) or Ant (release 0.12 and earlier).

Hive installation has these requirements:

Java 1.7 (preferred).
Note: Hive versions 1.2 onward require Java 1.7 or newer. Hive versions 0.14 to 1.1 work with Java 1.6, but prefer 1.7. Users are strongly advised to start moving to Java 1.8 (see HIVE-8607).
Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward).
Hive versions up to 0.13 also supported Hadoop 0.20.x, 0.23.x.
Hive is commonly used in production Linux and Windows environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac. Using it on Windows would require slightly different steps.

Installing from a Tarball

Start by downloading the most recent stable release of Hive from one of the Apache download mirrors (see Hive Releases).

Apache Hive : AdminManual Metastore 3.0 Administration

Dec 12, 2024

Apache Hive : AdminManual Metastore 3.0 Administration

Version Note

This document applies only to the Metastore in Hive 3.0 and later releases. For Hive 0, 1, and 2 releases please see the Metastore Administration document.

Introduction

The definition of Hive objects such as databases, tables, and functions are stored in the Metastore. Depending on how the system is configured, statistics and authorization records may also be stored there. Hive, and other execution engines, use this data at runtime to determine how to parse, authorize, and efficiently execute user queries.

Apache Hive : AdminManual Metastore Administration

Dec 12, 2024

Apache Hive : AdminManual Metastore Administration

This page only documents the MetaStore in Hive 2.x and earlier. For 3.x and later releases please see AdminManual Metastore 3.0 Administration

Introduction

All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using JPOX ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of supported databases in section below.

Apache Hive : AdminManual SettingUpHiveServer

Dec 12, 2024

Apache Hive : AdminManual SettingUpHiveServer

Setting Up Hive Server

Apache Hive : Hive MetaTool

Dec 12, 2024

Apache Hive : Hive MetaTool

Introduced in Hive 0.10.0. See HIVE-3056 and HIVE-3443.

The Hive MetaTool enables administrators to do bulk updates on the location fields in database, table, and partition records in the metastore. It provides the following functionality:

Ability to search and replace the HDFS NN (NameNode) location in metastore records that reference the NN. One use is to transition a Hive deployment to HDFS HA NN (HDFS High Availability NameNode).
A command line tool to execute JDOQL against the metastore. The ability to execute JDOQL against the metastore can be a useful debugging tool for both users and Hive developers.

The `metatool` Command

The metatool command invokes the Hive MetaTool with these options:

Apache Hive : Hive on Spark: Getting Started

Dec 12, 2024

Apache Hive : Hive on Spark: Getting Started

Hive on Spark provides Hive with the ability to utilize Apache Spark as its execution engine.

set hive.execution.engine=spark;

Hive on Spark was added in HIVE-7292.

Version Compatibility

Hive on Spark is only tested with a specific version of Spark, so a given version of Hive is only guaranteed to work with a specific version of Spark. Other versions of Spark may work with a given version of Hive, but that is not guaranteed. Below is a list of Hive versions and their corresponding compatible Spark versions.

Apache Hive : HiveAmazonElasticMapReduce

Dec 12, 2024

Apache Hive : HiveAmazonElasticMapReduce

Amazon Elastic MapReduce and Hive

Amazon Elastic MapReduce is a web service that makes it easy to launch managed, resizable Hadoop clusters on the web-scale infrastructure of Amazon Web Services (AWS). Elastic Map Reduce makes it easy for you to launch a Hive and Hadoop cluster, provides you with flexibility to choose different cluster sizes, and allows you to tear them down automatically when processing has completed. You pay only for the resources that you use with no minimums or long-term commitments.

Apache Hive : HiveAws

Dec 12, 2024

Apache Hive : HiveAws

= Hive and Amazon Web Services =

Background

This document explores the different ways of leveraging Hive on Amazon Web Services - namely S3, EC2 and Elastic Map-Reduce.

Hadoop already has a long tradition of being run on EC2 and S3. These are well documented in the links below which are a must read:

The second document also has pointers on how to get started using EC2 and S3. For people who are new to S3 - there’s a few helpful notes in S3 for n00bs section below. The rest of the documentation below assumes that the reader can launch a hadoop cluster in EC2, copy files into and out of S3 and run some simple Hadoop jobs.

Apache Hive : HiveDerbyServerMode

Dec 12, 2024

Apache Hive : Using Derby in Server Mode

Hive in embedded mode has a limitation of one active user at a time. You may want to run Derby as a Network Server, this way multiple users can access it simultaneously from different systems.

See Metadata Store and Embedded Metastore for more information.

Download Derby

It is suggested you download the version of Derby that ships with Hive. If you have already run Hive in embedded mode, the first line of derby.log contains the version.

Apache Hive : HiveJDBCInterface

Dec 12, 2024

Apache Hive : JDBC Driver

The current JDBC interface for Hive only supports running queries and fetching results. Only a small subset of the metadata calls are supported.

To see how the JDBC interface can be used, see sample code.

Integration with Pentaho

Download pentaho report designer from the pentaho website.
Overwrite report-designer.sh with the code provided below.

#!/bin/sh

HADOOP_CORE={{ls $HADOOP_HOME/hadoop-*-core.jar}}
CLASSPATH=.:$HADOOP_CORE:$HIVE_HOME/conf

for i in ${HIVE_HOME}/lib/*.jar ; do
  CLASSPATH=$CLASSPATH:$i
done

CLASSPATH=$CLASSPATH:launcher.jar

echo java -XX:MaxPermSize=512m -cp $CLASSPATH -jar launcher.jar
java -XX:MaxPermSize=512m -cp $CLASSPATH org.pentaho.commons.launcher.Launcher

Build and start the hive server with instructions from HiveServer.
Compile and run the Hive JDBC client code to load some data (I haven’t figured out how to do this in report designer yet). See sample code for loading the data.
Run the report designer (note step 2).

$ sh reporter-designer.sh

Select ‘Report Design Wizard’.
Select a template - say ‘fall template’ - next.
Create a new data source - JDBC (custom), Generic database.
Provide Hive JDBC parameters. Give the connection a name ‘hive’.

   URL: jdbc:hive://localhost:10000/default
   Driver name: org.apache.hadoop.hive.jdbc.HiveDriver
   Username and password are empty

Click on ‘Test’. The test should succeed.
Edit the query: select ‘Sample Query’, click edit query, click on the connection ‘hive’. Create a new query. Write a query on the table testHiveDriverTable, for example, select * from testHiveDriverTable. Click next.
Layout Step: Add PageOfPages to Group Items By. Add key and value as Selected Items. Click next. And Finish.
Change the Report header to ‘hive-pentaho-report’. Change the type of the header to ‘html’.
Run the report and generate pdf. You should get something like the report attached here.

Integration with SQuirrel SQL Client

Download, install and start the SQuirrel SQL Client from the SQuirrel SQL website.

Apache Hive : HiveODBC

Dec 12, 2024

Apache Hive : ODBC Driver

These instructions are for the Hive ODBC driver available in Hive for HiveServer1.
There is no ODBC driver available for HiveServer2 as part of Apache Hive. There are third party ODBC drivers available from different vendors, and most of them seem to be free.

Introduction

The Hive ODBC Driver is a software library that implements the Open Database Connectivity (ODBC) API standard for the Hive database management system, enabling ODBC compliant applications to interact seamlessly (ideally) with Hive through a standard interface. This driver will NOT be built as a part of the typical Hive build process and will need to be compiled and built separately according to the instructions below.

Apache Hive : HiveServer

Dec 12, 2024

Apache Hive : HiveServer

Thrift Hive Server

HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache ThriftTM (http://thrift.apache.org/), therefore it is sometimes called the Thrift server although this can lead to confusion because a newer service named HiveServer2 is also built on Thrift. Since the introduction of HiveServer2, HiveServer has also been called HiveServer1.

Apache Hive : Manual Installation

Dec 12, 2024

Apache Hive : Manual Installation

Installing, configuring and running Hive

You can install a stable release of Hive by downloading and unpacking a tarball, or you can download the source code and build Hive using Maven (release 3.6.3 and later).

Prerequisites

Java 8.
Maven 3.6.3
Protobuf 2.5
Hadoop 3.3.6 (As a preparation, configure it in single-node cluster, pseudo-distributed mode)
Tez. The default is MapReduce but we will change the execution engine to Tez.
Hive is commonly used in production Linux environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac.

Install the prerequisites

Java 8

Building Hive requires JDK 8 installed. Some notes in case you have ARM chipset (Apple M1 or later).

Apache Hive : Replication

Dec 12, 2024

Apache Hive : Replication

Overview

Hive Replication builds on the metastore event and ExIm features to provide a framework for replicating Hive metadata and data changes between clusters. There is no requirement for the source cluster and replica to run the same Hadoop distribution, Hive version, or metastore RDBMS. The replication system has a fairly ’light touch’, exhibiting a low degree of coupling and using the Hive-metastore Thrift service as an integration point. However, the current implementation is not an ‘out of the box’ solution. In particular it is necessary to provide some kind of orchestration service that is responsible for requesting replication tasks and executing them.

Apache Hive : Setting Up Hive with Docker

Dec 12, 2024

Apache Hive : Setting Up Hive with Docker

Introduction

Run Apache Hive inside docker container in pseudo-distributed mode

STEP 1: Pull the image

Pull the 4.0.0 image from Hive DockerHub

docker pull apache/hive:4.0.0

STEP 2: Export the Hive version

export HIVE_VERSION=4.0.0

STEP 3: Launch the HiveServer2 with an embedded Metastore.

This is lightweight and for a quick setup, it uses Derby as metastore db.

docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION}

STEP 4: Connect to beeline

docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/'

Note: Launch Standalone Metastore To use standalone Metastore with Derby,

Apache Hive : Setting Up HiveServer2

Dec 12, 2024

Apache Hive : Setting Up HiveServer2

HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro here). The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC.

The Thrift interface definition language (IDL) for HiveServer2 is available at https://github.com/apache/hive/blob/trunk/service/if/TCLIService.thrift.
Thrift documentation is available at http://thrift.apache.org/docs/.

This document describes how to set up the server. How to use a client with this server is described in the HiveServer2 Clients document.

Apache Hive : User and Group Filter Support with LDAP Atn Provider in HiveServer2

Dec 12, 2024

Apache Hive : User and Group Filter Support with LDAP Atn Provider in HiveServer2

User and Group Filter Support with LDAP

Starting in Hive 1.3.0, HIVE-7193 adds support in HiveServer2 for

LDAP Group filters
LDAP User filters
Custom LDAP Query support.

Filters greatly enhance the functionality of the LDAP Authentication provider. They allow Hive to restrict the set of users allowed to connect to HiveServer2.

See Authentication/Security Configuration for general information about configuring authentication for HiveServer2. Also see Hive Configuration Properties – HiveServer2 for individual configuration parameters discussed below.

Apache Hive : Iceberg REST Catalog API backed by Hive Metastore

Introduction

Basic configurations

Authentication

Apache Hive : Setting up Metastore backed by MariaDB

Note

Introduction

Apache Hive : Hive Schema Tool

About

Metastore Schema Verification

Apache Hive : Setting Up OAuth 2

Supported Features

Examples: Integration with external Authorization Servers

Apache Hive : AdminManual

Hive Administrator’s Manual

Apache Hive : AdminManual Configuration

Configuring Hive

Apache Hive : AdminManual Installation

Installing Hive

Installing from a Tarball

Apache Hive : AdminManual Metastore 3.0 Administration

Version Note

Introduction

Apache Hive : AdminManual Metastore Administration

Introduction

Apache Hive : AdminManual SettingUpHiveServer

Setting Up Hive Server

Apache Hive : Hive MetaTool

The metatool Command

Apache Hive : Hive on Spark: Getting Started

Version Compatibility

Apache Hive : HiveAmazonElasticMapReduce

Amazon Elastic MapReduce and Hive

Apache Hive : HiveAws

Background

Apache Hive : Using Derby in Server Mode

Download Derby

Apache Hive : JDBC Driver

Integration with Pentaho

Integration with SQuirrel SQL Client

Apache Hive : ODBC Driver

Introduction

Apache Hive : HiveServer

Thrift Hive Server

Apache Hive : Manual Installation

Installing, configuring and running Hive

Prerequisites

Install the prerequisites

Java 8

Apache Hive : Replication

Overview

Apache Hive : Setting Up Hive with Docker

Introduction

STEP 1: Pull the image

STEP 2: Export the Hive version

STEP 3: Launch the HiveServer2 with an embedded Metastore.

STEP 4: Connect to beeline

Apache Hive : Setting Up HiveServer2

Apache Hive : User and Group Filter Support with LDAP Atn Provider in HiveServer2

User and Group Filter Support with LDAP

The `metatool` Command