Apache Hive : Hadoop-compatible Input-Output Format for Hive

Overview

This is a proposal for adding API to Hive which allows reading and writing using a Hadoop compatible API. Specifically, the interfaces being implemented are:

The classes will be named HiveApiInputFormat and HiveApiOutputFormat.

See HIVE-3752 for discussion of this proposal.

InputFormat (reading from Hive)

Usage:

  1. Create a HiveInputDescription object.
  2. Fill it with information about the table to read from (with database, partition, columns).
  3. Initialize HiveApiInputFormat with the information.
  4. Go to town using HiveApiInputFormat with your Hadoop-compatible reading system.

More detailed information:

Future plans:

OutputFormat (writing to Hive)

Usage:

  1. Create a HiveOutputDescription object.
  2. Fill it with information about the table to write to (with database and partition).
  3. Initialize HiveApiOutputFormat with the information.
  4. Go to town using HiveApiOutputFormat with your Hadoop-compatible writing system.