public final class GenMRSkewJoinProcessor
- extends Object
|Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public static void processSkewJoin(JoinOperator joinOp,
Task<? extends Serializable> currTask,
- Create tasks for processing skew joins. The idea is (HIVE-964) to use
separated jobs and map-joins to handle skew joins.
For each table, we launch one mapjoin job, taking the directory containing
big keys in this table and corresponding dirs in other tables as input.
(Actally one job for one row in the above.)
Number of mr jobs to handle skew keys is the number of table minus 1 (we
can stream the last table, so big keys in the last table will not be a
At runtime in Join, we output big keys in one table into one corresponding
directories, and all same keys in other tables into different dirs(one for
each table). The directories will look like:
dir-T1-bigkeys(containing big keys in T1), dir-T2-keys(containing keys
which is big in T1),dir-T3-keys(containing keys which is big in T1), ...
dir-T1-keys(containing keys which is big in T2), dir-T2-bigkeys(containing
big keys in T2),dir-T3-keys(containing keys which is big in T2), ...
dir-T1-keys(containing keys which is big in T3), dir-T2-keys(containing big
keys in T3),dir-T3-bigkeys(containing keys which is big in T3), ... .....
For more discussions, please check
public static boolean skewJoinEnabled(HiveConf conf,
Copyright © 2012 The Apache Software Foundation