Apache Hive : UserGuide

NOTE: This page is deprecated and merged into GettingStarted.

User Guide

The query language specification is available at LanguageManual. Also see, GettingStarted for setup instructions.

Supported Features

Usage Examples

Creating tables

MovieLens User Ratings

CREATE TABLE u\_data (
  userid INT,
  movieid INT,
  rating INT,
  unixtime STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

Apache Access Log Tables

add jar ../build/contrib/hive\_contrib.jar;

CREATE TABLE apachelog (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE;

Control Separated Tables

CREATE TABLE mylog (
name STRING, language STRING, groups ARRAY<STRING>, entities MAP<INT, STRING>)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\001'
  COLLECTION ITEMS TERMINATED BY '\002'
  MAP KEYS TERMINATED BY '\003'
STORED AS TEXTFILE;

Loading tables

MovieLens User Ratings

Download and extract the data:

wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
tar xvzf ml-data.tar+0.gz

Load it in:

LOAD DATA LOCAL INPATH 'ml-data/u.data'
OVERWRITE INTO TABLE u\_data;

Running queries

MovieLens User Ratings

SELECT COUNT(1) FROM u\_data;

Running custom map/reduce jobs

MovieLens User Ratings

Create weekday_mapper.py:

import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print '\t'.join([userid, movieid, rating, str(weekday)])

Use the mapper script:

CREATE TABLE u\_data\_new (
  userid INT,
  movieid INT,
  rating INT,
  weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

INSERT OVERWRITE TABLE u\_data\_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)
  USING 'python weekday\_mapper.py'
  AS (userid, movieid, rating, weekday)
FROM u\_data;

SELECT weekday, COUNT(1)
FROM u\_data\_new
GROUP BY weekday;

Note: due to a bug in the parser, you must run the “INSERT OVERWRITE” query on a single line

Using sampling

Known Issues/Bugs