Apache Hive : LanguageManual DML

Hive Data Manipulation Language

There are multiple ways to modify data in Hive:

EXPORT and IMPORT commands are also available (as of Hive 0.8).

Loading files into tables

Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.

Syntax
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
Synopsis

Load operations prior to Hive 3.0 are pure copy/move operations that move datafiles into locations corresponding to Hive tables.

Additional load operations are supported by Hive 3.0 onwards as Hive internally rewrites the load into an INSERT AS SELECT.

Example of such a schema:

CREATE TABLE tab1 (col1 int, col2 int) PARTITIONED BY (col3 int) STORED AS ORC;
LOAD DATA LOCAL INPATH 'filepath' INTO TABLE tab1;

Here, partition information is missing which would otherwise give an error, however, if the file(s) located at filepath conform to the table schema such that each row ends with partition column(s) then the load will rewrite into an INSERT AS SELECT job.

The uncompressed data should look like this:

(1,2,3), (2,3,4), (4,5,3) etc.

Notes

Inserting data into Hive Tables from queries

Query Results can be inserted into tables by using the insert clause.

Syntax
Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select\_statement1 FROM from\_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select\_statement1 FROM from\_statement;

Hive extension (multiple inserts):
FROM from\_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select\_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select\_statement2] 
[INSERT INTO TABLE tablename2 [PARTITION ...] select\_statement2] ...;
FROM from\_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select\_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select\_statement2] 
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select\_statement2] ...;

Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select\_statement FROM from\_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select\_statement FROM from\_statement;
Synopsis
Notes
Dynamic Partition Inserts

Version information

This information reflects the situation in Hive 0.12; dynamic partition inserts were added in Hive 0.6.

In the dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this a static partition, otherwise it is a dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause. As of Hive 3.0.0 (HIVE-19083) there is no need to specify dynamic partition columns. Hive will automatically generate partition specification if it is not specified.

Dynamic partition inserts are disabled by default prior to Hive 0.9.0 and enabled by default in Hive 0.9.0 and later. These are the relevant configuration properties for dynamic partition inserts:

Configuration property Default Note
hive.exec.dynamic.partition true Needs to be set to true to enable dynamic partition inserts
hive.exec.dynamic.partition.mode strict In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict mode all partitions are allowed to be dynamic
hive.exec.max.dynamic.partitions.pernode 100 Maximum number of dynamic partitions allowed to be created in each mapper/reducer node
hive.exec.max.dynamic.partitions 1000 Maximum number of dynamic partitions allowed to be created in total
hive.exec.max.created.files 100000 Maximum number of HDFS files created by all mappers/reducers in a MapReduce job
hive.error.on.empty.partition false Whether to throw an exception if dynamic partition insert generates empty results
Example
FROM page\_view\_stg pvs
INSERT OVERWRITE TABLE page\_view PARTITION(dt='2008-06-08', country)
       SELECT pvs.viewTime, pvs.userid, pvs.page\_url, pvs.referrer\_url, null, null, pvs.ip, pvs.cnt

Here the country partition will be dynamically created by the last column from the SELECT clause (i.e. pvs.cnt). Note that the name is not used. In nonstrict mode the dt partition could also be dynamically created.

Additional Documentation

Writing data into the filesystem from queries

Query results can be inserted into filesystem directories by using a slight variation of the syntax above:

Syntax
Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row\_format] [STORED AS file\_format] (Note: Only available starting with Hive 0.11.0)
  SELECT ... FROM ...

Hive extension (multiple inserts):
FROM from\_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select\_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select\_statement2] ...

 
row\_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
Synopsis
Notes

Inserting values into tables from SQL

The INSERT…VALUES statement can be used to insert data into tables directly from SQL.

Version Information

INSERT…VALUES is available starting in Hive 0.14.

Syntax
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values\_row [, values\_row ...]
 
Where values\_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
Synopsis
Examples
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
  CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;

INSERT INTO TABLE students
  VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);

CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came\_from STRING)
  PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;

INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
  VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);

INSERT INTO TABLE pageviews PARTITION (datestamp)
  VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
 
INSERT INTO TABLE pageviews
  VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');

Update

Version Information

UPDATE is available starting in Hive 0.14.

Updates can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Standard Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
Synopsis
Notes

Delete

Version Information

DELETE is available starting in Hive 0.14.

Deletes can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Standard Syntax:
DELETE FROM tablename [WHERE expression]
Synopsis
Notes

Merge

Version Information

MERGE is available starting in Hive 2.2.

Merge can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Standard Syntax:
MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>
Synopsis
Performance Note

SQL Standard requires that an error is raised if the ON clause is such that more than 1 row in source matches a row in target.  This check is computationally expensive and may affect the overall runtime of a MERGE statement significantly.  hive.merge.cardinality.check=false may be used to disable the check at your own risk.  If the check is disabled, but the statement has such a cross join effect, it may lead to data corruption.

Notes
Examples