Hive 的配置说明
Hive的相关配置说明
1、Query and DDL Execution 查询和DDL操作
属性名称 | 默认值 | 更新版 | 属性说明 |
---|---|---|---|
mapred.reduce.tasks |
-1 | Hive 0.1.0 |
The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas Hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers. 每个作业的减少任务的默认数量。通常设置为接近可用主机的数量。当mapred.job.tracker是“local”时忽略。 Hadoop默认设置为1,而Hive使用-1作为默认值。通过将此属性设置为-1,Hive将自动计算出应该是reducers数量。 |
hive.exec.reducers.bytes.per.reducer |
1,000,000,000 prior to Hive 0.14.0; 256 MB (256,000,000 ) in Hive 0.14.0 and later |
Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917) |
Size per reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if the input size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later the default is 256 MB, that is, if the input size is 1 GB then 4 reducers will be used. 每个reducer的尺寸。 Hive 0.14.0及更早版本的默认值为1 GB,即如果输入大小为10 GB,则会使用10个reducers。在Hive 0.14.0及更高版本中,默认值为256 MB,即如果输入大小为1 GB,则将使用4个reducers。 |
hive.exec.reducers.max |
999 prior to Hive 0.14.0; 1009 in Hive 0.14.0 and later |
Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917) |
Maximum number of reducers that will be used. If the one specified in the configuration property mapred.reduce.tasks is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers. 将使用的最大reducers 数量。如果在配置属性mapred.reduce.tasks中指定的值为负数,那么Hive会在自动确定reducer数时使用这个作为reducer的最大数目。 |
hive.jar.path |
(empty) | Hive 0.2.0 or earlier |
The location of hive_cli.jar that is used when submitting jobs in a separate jvm. 在单独的jvm中提交作业时使用的hive_cli.jar的位置。 |
hive.aux.jars.path |
(empty) | Hive 0.2.0 or earlier |
The location of the plugin jars that contain implementations of user defined functions (UDFs) and SerDes. 包含用户定义函数(UDF)和SerDes实现的插件JAR的位置。 |
hive.reloadable.aux.jars.path |
(empty) | Hive 0.14.0 with HIVE-7553 |
The locations of the plugin jars, which can be comma-separated folders or jars. They can be renewed (added, removed, or updated) by executing the Beeline reload command without having to restart HiveServer2. These jars can be used just like the auxiliary classes in hive.aux.jars.path for creating UDFs or SerDes. plugin jars的位置,可以是逗号分隔的folders or jars。可以通过执行Beeline reload命令来更新(添加,删除或更新)它们,而无需重新启动HiveServer2。这些jars 可以像hive.aux.jars.path中用于创建UDF或SerDes的辅助类一样使用。 |
hive.exec.scratchdir |
|
Hive 0.2.0; default changed |
This directory is used by Hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages. 该目录由Hive用来存储查询的不同map / reduce阶段的计划,以及存储这些阶段的中间输出。 |
hive.scratch.dir.permission |
700 | Hive 0.12.0 with HIVE-4487 |
The permission for the user-specific scratch directories that get created in the root scratch directory. (See hive.exec.scratchdir.) 根临时目录中创建的用户特定临时目录的权限。 (请参阅hive.exec.scratchdir。) |
hive.exec.local.scratchdir |
/tmp/${user.name } |
Hive 0.10.0 with HIVE-1577 |
Scratch space for Hive jobs when Hive runs in local mode. Also see hive.exec.scratchdir. 当Hive以本地模式运行时,Hive作业的临时空间。另请参阅hive.exec.scratchdir。 |
hive.hadoop.supports.splittable.combineinputformat |
false | Hive 0.6.0 with HIVE-1280(
|
Whether to combine small input files so that fewer mappers are spawned. 是否结合小的输入文件,以减少映射器的产生。 |
hive.map.aggr |
true in Hive 0.3 and later; false in Hive 0.2 |
Hive 0.2.0 |
Whether to use map-side aggregation in Hive Group By queries. 是否在Hive Group By查询中使用map-side聚合。 |
hive.groupby.skewindata |
false | Hive 0.3.0 |
Whether there is skew in data to optimize group by queries. 数据是否存在偏差以通过查询来优化组。 |
hive.groupby.mapaggr.checkinterval |
100000 | Hive 0.3.0 |
Number of rows after which size of the grouping keys/aggregation classes is performed. 执行分组键/聚合类之后的行数。 |
hive.new.job.grouping.set.cardinality |
30 | Hive 0.11.0 with HIVE-3552 |
Whether a new map-reduce job should be launched for grouping sets/rollups/cubes. 是否应该为分组sets/rollups/cubes启动新的map-reduce作业。 |
hive.mapred.local.mem |
0 | Hive 0.3.0 |
For local mode, memory of the mappers/reducers. 对于本地模式,mappers / redurs的内存。 |
hive.map.aggr.hash.force.flush.memory.threshold |
0.9 | Hive 0.7.0 with HIVE-1830 |
The maximum memory to be used by map-side group aggregation hash table. If the memory usage is higher than this number, force to flush data. map-side 组聚合 hash 表使用的最大内存。如果内存使用率高于此数字,则强制刷新数据。 |
hive.map.aggr.hash.percentmemory |
0.5 | Hive 0.2.0 |
Portion of total memory to be used by map-side group aggregation hash table. map-side组聚合哈希表使用的总内存的部分。 |
hive.map.aggr.hash.min.reduction |
0.5 | Hive 0.4.0 |
Hash aggregation will be turned off if the ratio between hash table size and input rows is bigger than this number. Set to 1 to make sure hash aggregation is never turned off. 如果哈希表大小和输入行之间的比率大于此数字,哈希聚合将被关闭。设置为1以确保哈希聚合从不关闭。 |
hive.optimize.groupby |
true |
Hive 0.5.0 |
Whether to enable the bucketed group by from bucketed partitions/tables. 是否通过分区partitions/tables启用分组。 |
hive.optimize.countdistinct |
true | Hive 3.0.0 with HIVE-16654 |
Whether to rewrite count distinct into 2 stages, i.e., the first stage uses multiple reducers with the count distinct key and the second stage uses a single reducer without key. 是否重写计数分为两个阶段,即第一阶段使用多个reducers 与计数明显的关键并且第二阶段使用一个单一的reducer 没有关键。 |
hive.optimize.remove.sq_count_check |
false | Hive 3.0.0 with HIVE-16793 |
Whether to remove an extra join with sq_count_check UDF for scalar subqueries with constant group by keys. 是否使用sq_count_check UDF删除额外的联接,用于具有按键的常量组的标量子查询。 |
hive.multigroupby.singlereducer |
true | Hive 0.9.0 with HIVE-2621 |
Whether to optimize multi group by query to generate a single M/R job plan. If the multi group by query has common group by keys, it will be optimized to generate a single M/R job. 是否通过查询优化多个组以生成单个M / R作业计划。如果通过查询的多组具有通过键的公共组,则它将被优化以生成单个M / R作业。
|
hive.optimize.index.filter |
false |
Hive 0.8.0 with HIVE-1644 |
Whether to enable automatic use of indexes. Note: See Indexing for more configuration properties related to Hive indexes. 是否启用自动使用索引。 |
hive.optimize.ppd |
true | Hive 0.4.0 with HIVE-279, default changed to true in Hive 0.4.0 with HIVE-626 |
Whether to enable predicate pushdown (PPD). Note: Turn on hive.optimize.index.filter as well to use file format specific indexes with PPD. 是否启用predicate 下推(PPD)。 |
hive.optimize.ppd.storage |
true | Hive 0.7.0 |
Whether to push predicates down into storage handlers. Ignored when hive.optimize.ppd is false. 是否将predicates 推入存储处理程序。 hive.optimize.ppd为false时忽略。 |
hive.ppd.remove.duplicatefilters |
true | Hive 0.8.0 |
During query optimization, filters may be pushed down in the operator tree. If this config is true, only pushed down filters remain in the operator tree, and the original filter is removed. If this config is false, the original filter is also left in the operator tree at the original place. 在查询优化期间,过滤器可能会在操作树中下推。如果此配置为true,则只有按下过滤器保留在操作员树中,并删除原始过滤器。如果此配置为false,则原始过滤器也将保留在原始位置的操作员树中。 |
hive.ppd.recognizetransivity |
true | Hive 0.8.1 |
Whether to transitively replicate predicate filters over equijoin conditions. 是否在等同条件下传递复制谓词过滤器。 |
hive.join.emit.interval |
1000 | Hive 0.2.0 |
How many rows in the right-most join operand Hive should buffer before emitting the join result. 在发出连接结果之前,最右侧连接操作数Hive应该缓冲多少行。 |
hive.join.cache.size |
25000 |
Hive 0.5.0 |
How many rows in the joining tables (except the streaming table) 连接表中有多少行(流表除外)应该被缓存在内存中。 |
hive.mapjoin.bucket.cache.size |
100 | Hive 0.5.0 (replaced by hive.smbjoin.cache.rows in Hive 0.12.0) |
How many values in each key in the map-joined table should be cached in memory. 内存映射表中的每个键中有多少个值应该被缓存在内存中。 |
hive.mapjoin.followby.map.aggr.hash.percentmemory |
0.3 | Hive 0.7.0 with HIVE-1830 |
Portion of total memory to be used by map-side group aggregation hash table, when this group by is followed by map join. 总体内存部分将被map-side组聚合哈希表使用,当这个组通过之后是映射连接。 |
hive.smalltable.filesizeorhive.mapjoin.smalltable.filesize |
25000000 |
Hive 0.7.0 with HIVE-1642: hive.smalltable.filesize (replaced by hive.mapjoin.smalltable.filesize in Hive 0.8.1) |
The threshold (in bytes) for the input file size of the small tables; if the file size is smaller than this threshold, it will try to convert the common join into map join. 小表的输入文件大小的阈值(以字节为单位)如果文件大小小于此阈值,则会尝试将常用连接转换为映射连接。 |
hive.mapjoin.check.memory.rows |
100000 | Hive 0.7.0 with HIVE-1808 and HIVE-1642 |
The number means after how many rows processed it needs to check the memory usage. 数字意味着在处理了多少行之后,需要检查内存使用情况。 |
hive.ignore.mapjoin.hint |
true | Hive 0.11.0 with HIVE-4042 |
Whether Hive ignores the mapjoin hint. Hive是否忽略mapjoin提示。 |
hive.smbjoin.cache.rows |
10000 |
Hive 0.12.0 with HIVE-4440 (replaces |
How many rows with the same key value should be cached in memory per sort-merge-bucket joined table. 每个sort-merge-bucket合并表应该在内存中缓存多少个具有相同键值的行。 |
hive.mapjoin.optimized.hashtable |
true | Hive 0.14.0 with HIVE-6430 |
Whether Hive should use a memory-optimized hash table for MapJoin. Only works on Tez and Spark, because memory-optimized hash table cannot be serialized. (Spark is supported starting from Hive 1.3.0, with HIVE-11180.) Hive是否应该为MapJoin使用内存优化的哈希表。只适用于Tez和Spark,因为内存优化的哈希表不能被序列化。 (从Hive 1.3.0开始支持Spark,HIVE-11180。) |
hive.hashtable.initialCapacity |
100000 | Hive 0.7.0 with HIVE-1642 |
Initial capacity of mapjoin hashtable if statistics are absent, or if hive.hashtable.key.count.adjustment is set to 0. 如果统计信息不存在,或者hive.hashtable.key.count.adjustment设置为0,那么mapjoin散列表的初始容量。 |
hive.hashtable.loadfactor |
0.75 | Hive 0.7.0 with HIVE-1642 |
In the process of Mapjoin, the key/value will be held in the hashtable. This value means the load factor for the in-memory hashtable. 在Mapjoin的过程中,key/value 将被保存在散列表中。此值表示内存散列表的加载因子。 |
hive.hashtable.key.count.adjustment |
1.0 | Hive 0.14.0 with HIVE-7616 |
Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate of the number of keys is divided by this value. If the value is 0, statistics are not used and hive.hashtable.initialCapacity is used instead. 调整映射加入来自表和列统计量的哈希表大小;键的数量的估计值除以该值。如果值为0,则不使用统计信息,而使用hive.hashtable.initialCapacity。 |
hive.debug.localtask |
false | Hive 0.7.0 with HIVE-1642 | |
hive.optimize.skewjoin |
false | Hive 0.6.0 |
Whether to enable skew join optimization. (Also see hive.optimize.skewjoin.compiletime.) 是否启用偏斜连接优化。 (另请参阅hive.optimize.skewjoin.compiletime。) |
hive.skewjoin.key |
100000 | Hive 0.6.0 |
Determine if we get a skew key in join. If we see more than the specified number of rows with the same key in join operator, we think the key as a skew join key. 确定我们是否在加入中得到一个歪斜键。如果我们在连接运算符中看到多于具有相同键的指定数量的行,则我们将该键看作偏斜连接键。 |
hive.skewjoin.mapjoin.map.tasks |
10000 | Hive 0.6.0 |
Determine the number of map task used in the follow up map join job for a skew join. It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine grained control. 确定跟随map 连接job 中用于偏斜连接的map task 的数量。它应该与hive.skewjoin.mapjoin.min.split一起使用来执行细粒度的控制。 |
hive.skewjoin.mapjoin.min.split |
33554432 | Hive 0.6.0 |
Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with hive.skewjoin.mapjoin.map.tasks to perform a fine grained control. 通过指定最小拆分大小,确定最多用于歪斜连接的后续map 连接job 中的map task的数量。它应该与hive.skewjoin.mapjoin.map.tasks一起使用来执行细粒度的控制。 |
hive.optimize.skewjoin.compiletime |
false | Hive 0.10.0 |
Whether to create a separate plan for skewed keys for the tables in the join. This is based on the skewed keys stored in the metadata. At compile time, the plan is broken into different joins: one for the skewed keys, and the other for the remaining keys. And then, a union is performed for the two joins generated above. So unless the same skewed key is present in both the joined tables, the join for the skewed key will be performed as a map-side join. 是否为连接中的表创建单独的计划。 |
hive.optimize.union.remove |
false | Hive 0.10.0 with HIVE-3276 |
Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted. 是否移除工会,推动工会和工会之间的工会之间的运营商。这可以避免通过联合对输出进行额外的扫描。这对于联合查询是独立有用的,并且在hive.optimize.skewjoin.compiletime设置为true时特别有用,因为插入了一个额外的联合。 |
hive.mapred.supports.subdirectories |
false | Hive 0.10.0 with HIVE-3276 |
Whether the version of Hadoop which is running supports sub-directories for tables/partitions. Many Hive optimizations can be applied if the Hadoop version supports sub-directories for tables/partitions. This support was added by MAPREDUCE-1501. 所运行的Hadoop的版本是否支持tables/partitions的子目录。如果Hadoop版本支持表/分区的子目录,则可以应用许多Hive优化。这个支持是由MAPREDUCE-1501添加的。 |
hive.mapred.mode |
Hive0.x:
Hive2.x: |
Hive 0.3.0 |
The mode in which the Hive operations are being performed. In Hive操作正在执行的模式。在严格模式下,一些有风险的查询是不允许运行的。例如,防止全表扫描(见HIVE-10454),ORDER BY需要LIMIT子句。 |
hive.exec.script.maxerrsize |
100000 | Hive 0.2.0 |
Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity. 脚本允许发送到标准错误的最大字节数(每个map-reduce任务)。这样可以防止失控脚本将日志分区填充到容量。 |
hive.script.auto.progress |
false | Hive 0.4.0 |
Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker. Hive Tranform / Map / Reduce Clause是否应该自动发送进度信息给TaskTracker,以避免由于不活动而导致任务中断。当脚本输出到stderr时,Hive发送进度信息。此选项不需要定期生成stderr消息,但用户应该谨慎,因为这可能会阻止脚本中的无限循环被TaskTracker终止。 |
hive.exec.script.allow.partial.consumption |
false | Hive 0.5.0 |
When enabled, this option allows a user script to exit successfully without consuming all the data from the standard input. 启用时,此选项允许用户脚本成功退出,而不消耗标准输入中的所有数据。 |
hive.script.operator.id.env.var |
HIVE_SCRIPT_OPERATOR_ID | Hive 0.5.0 |
Name of the environment variable that holds the unique script operator ID in the user's transform function (the custom mapper/reducer that the user has specified in the query). 在用户的转换函数(用户在查询中指定的自定义mapper/reducer)中保存唯一脚本运算符ID的环境变量的名称。 |
hive.script.operator.env.blacklist |
hive.txn.valid.txns, hive.script.operator.env.blacklist |
Hive 0.14.0 with HIVE-8341 |
By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment. However, some values can grow large or are not amenable to translation to environment variables. This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator. By default the valid transaction list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable. 默认情况下,HiveConf对象中的所有值将被转换为与该键相同名称的环境变量('.'(点)转换为'_'(下划线)),并设置为脚本运算符环境的一部分。但是,有些价值可能会变大,或者不适合翻译成环境变量。该值给出一个逗号分隔的配置值列表,在调用脚本运算符时不会在环境中设置。默认情况下,有效的事务列表被排除,因为它可以变大,有时被压缩,这不能很好地转换成环境变量。 |
hive.exec.compress.output |
false | Hive 0.2.0 |
This controls whether the final outputs of a query (to a local/hdfs file or a Hive table) is compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress* . 这将控制查询的最终输出(到 local/hdfs 文件或Hive表)是否被压缩。压缩编解码器和其他选项由Hadoop配置变量mapred.output.compress *确定。 |
hive.exec.compress.intermediate |
false | Hive 0.2.0 |
This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*. 这将控制是否压缩多个map-reduce作业之间由Hive生成的中间文件。压缩编解码器和其他选项由Hadoop配置变量mapred.output.compress *确定。 |
hive.exec.parallel |
false | Hive 0.5.0 |
Whether to execute jobs in parallel. Applies to MapReduce jobs that can run in parallel, for example jobs processing different source tables before a join. As of Hive 0.14, also applies to move tasks that can run in parallel, for example moving files to insert targets during multi-insert. 是否并行执行作业。适用于可以并行运行的MapReduce作业,例如在联接之前处理不同源表的作业。从Hive 0.14开始,也适用于移动可以并行运行的任务,例如在多重插入时移动文件以插入目标。 |
hive.exec.parallel.thread.number |
8 | Hive 0.6.0 |
How many jobs at most can be executed in parallel. 最多可以同时执行多少个作业。 |
hive.exec.rowoffset |
false | Hive 0.8.0 |
Whether to provide the row offset virtual column. 是否提供行偏移虚拟列。 |
hive.counters.group.name |
HIVE | Hive 0.13.0 with HIVE-4518 | 在查询执行期间使用的计数器的计数器组名。计数器组用于内部Hive变量(CREATED_FILE,FATAL_ERROR等)。 |
hive.exec.pre.hooks |
(empty) | Hive 0.4.0 |
Comma-separated list of pre-execution hooks to be invoked for each statement. A pre-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface. 为每个语句调用预执行hooks 的逗号分隔列表。预执行hooks 被指定为实现org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext接口的Java类的名称。 |
hive.exec.post.hooks |
(empty) | Hive 0.5.0 |
Comma-separated list of post-execution hooks to be invoked for each statement. A post-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface. 为每个语句调用的逗号分隔列表。执行后hook 被指定为实现org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext接口的Java类的名称。 |
hive.exec.failure.hooks |
(empty) | Hive 0.8.0 |
Comma-separated list of on-failure hooks to be invoked for each statement. An on-failure hook is specified as the name of Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface. 为每个语句调用的逗号分隔列表。一个on-failure hooks 被指定为实现org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext接口的Java类的名字。 |
hive.merge.mapfiles |
true | Hive 0.4.0 |
Merge small files at the end of a map-only job. 在map-only的作业结束时合并小文件。 |
hive.merge.mapredfiles |
false | Hive 0.4.0 |
Merge small files at the end of a map-reduce job. 在map-reduce作业结束时合并小文件。 |
hive.merge.size.per.task |
256000000 | Hive 0.4.0 |
Size of merged files at the end of the job. 作业结束时合并文件的大小。 |
hive.merge.smallfiles.avgsize |
16000000 | Hive 0.5.0 |
When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true. 当作业的平均输出文件大小小于这个数字时,Hive将启动一个额外的map-reduce作业,将输出文件合并成更大的文件。只有在hive.merge.mapfiles为true的情况下才能执行map-only作业,如果hive.merge.mapredfiles为true,则只执行map-reduce作业。 |
hive.heartbeat.interval |
1000 | Hive 0.4.0 |
Send a heartbeat after this interval – used by mapjoin and filter operators. 在此间隔之后发送心跳 - 由mapjoin和过滤器运算符使用。 |
hive.auto.convert.join |
true in 0.11.0 and later (HIVE-3297) |
0.7.0 with HIVE-1642 |
Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. (Note that hive-default.xml.template incorrectly gives the default as false in Hive 0.11.0 through 0.13.1.) 无论Hive是否启用基于输入文件大小将普通连接转换为mapjoin的优化。 (请注意,hive-default.xml.template在Hive 0.11.0到0.13.1中错误地将默认值设置为false。) |
hive.auto.convert.join.noconditionaltask |
true | 0.11.0 with HIVE-3784 (default changed to true with HIVE-4146) |
Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. If this parameter is on, and the sum of size for n-1 of the tables/partitions for an n-way join is smaller than the size specified by hive.auto.convert.join.noconditionaltask.size, the join is directly converted to a mapjoin (there is no conditional task). 无论Hive是否启用基于输入文件大小将普通连接转换为mapjoin的优化。如果此参数处于打开状态,并且n-way联接的表/分区的n-1大小总和小于hive.auto.convert.join.noconditionaltask.size指定的大小,则会直接转换联接到一个mapjoin(没有条件任务)。 |
hive.auto.convert.join.noconditionaltask.size |
10000000 | 0.11.0 with HIVE-3784 |
If hive.auto.convert.join.noconditionaltask is off, this parameter does not take effect. However, if it is on, and the sum of size for n-1 of the tables/partitions for an n-way join is smaller than this size, the join is directly converted to a mapjoin (there is no conditional task). The default is 10MB. 如果hive.auto.convert.join.noconditionaltask关闭,则此参数不起作用。但是,如果它处于打开状态,并且n路连接的表/分区的n-1大小之和小于此大小,则连接将直接转换为map连接(不存在任何条件任务)。默认值是10MB。 |
hive.auto.convert.join.use.nonstaged |
false |
changed to with HIVE-6749 also in 0.13.0 |
For conditional joins, if input stream from a small alias can be directly applied to the join operator without filtering or projection, the alias need not be pre-staged in the distributed cache via a mapred local task. Currently, this is not working with vectorization or Tez execution engine. 对于条件连接,如果来自小别名的输入流可以直接应用于连接运算符而不进行过滤或投影,则别名不需要通过映射的本地任务在分布式缓存中预先设置。目前,这不适用于矢量化或Tez执行引擎。 |
hive.merge.nway.joins |
true | 2.2.0 with HIVE-15655 |
For multiple joins on the same condition, merge joins together into a single join operator. This is useful in the case of large shuffle joins to avoid a reshuffle phase. Disabling this in Tez will often provide a faster join algorithm in case of left outer joins or a general Snowflake schema. 对于相同条件下的多个连接,将连接合并成一个连接运算符。这对于大型混洗连接以避免重新洗牌阶段很有用。在Tez中禁用这个功能通常会提供一个更快的连接算法,用于左外连接或一般Snowflake模式。 |
hive.udtf.auto.progress |
false | Hive 0.5.0 |
Whether Hive should automatically send progress information to TaskTracker when using UDTF's to prevent the task getting killed because of inactivity. Users should be cautious because this may prevent TaskTracker from killing tasks with infinite loops. Hive是否应该在使用UDTF时自动发送进度信息给TaskTracker,以防止由于不活动而导致任务中断。用户应该谨慎,因为这可能会阻止TaskTracker杀死具有无限循环的任务。 |
hive.exec.counters.pull.interval |
1000 | Hive 0.6.0 |
The interval with which to poll the JobTracker for the counters the running job. The smaller it is the more load there will be on the jobtracker, the higher it is the less granular the caught will be. 轮询正在运行的作业的计数器的JobTracker的时间间隔。工作追踪器上的负载越小,捕获的粒度越小。 |
hive.optimize.bucketingsorting |
true | Hive 0.11.0 with HIVE-4240 |
If hive.enforce.bucketing or hive.enforce.sorting is true, don't create a reducer for enforcing bucketing/sorting for queries of the form:
如果hive.enforce.bucketing或hive.enforce.sorting为true,则不要为表单查询创建一个reducer 来强制执行分段/排序: |
hive.optimize.reducededuplication |
true | Hive 0.6.0 |
Remove extra map-reduce jobs if the data is already clustered by the same key which needs to be used again. This should always be set to true. Since it is a new feature, it has been made configurable. 如果数据已被同一个密钥重新使用,则需要移除额外的map-reduce作业。这应该始终设置为true。由于它是一个新功能,因此它已经被配置。 |
hive.optimize.correlation |
false |
Hive 0.12.0 with HIVE-2206 |
Exploit intra-query correlations. For details see the Correlation Optimizer design document. 利用查询内相关性。有关详细信息,请参阅关联优化器设计文档。 |
hive.optimize.limittranspose |
false | Hive 2.0.0 with HIVE-11684, modified by HIVE-11775 |
Whether to push a limit through left/right outer join or union. If the value is true and the size of the outer input is reduced enough (as specified in hive.optimize.limittranspose.reductionpercentage and hive.optimize.limittranspose.reductiontuples), the limit is pushed to the outer input or union; to remain semantically correct, the limit is kept on top of the join or the union too. 是否通过left/right外连接或联合来推动限制。如果该值为true,并且外部输入的大小足够小(如hive.optimize.limittranspose.reductionpercentage和hive.optimize.limittranspose.reductiontuples中所指定的那样),则将限制推送到外部输入或联合;为了保持语义上的正确性,限制保持在联接或联合之上。 |
hive.optimize.limittranspose.reductionpercentage |
1.0 | Hive 2.0.0 with HIVE-11684, modified by HIVE-11775 |
When hive.optimize.limittranspose is true, this variable specifies the minimal percentage (fractional) reduction of the size of the outer input of the join or input of the union that the optimizer should get in order to apply the rule. 当hive.optimize.limittranspose为true时,此变量指定优化程序为了应用规则而应该获得的联合或联合输入的外部输入大小的最小百分比(小数)减少。 |
hive.optimize.limittranspose.reductiontuples |
0 | Hive 2.0.0 with HIVE-11684, modified by HIVE-11775 |
When hive.optimize.limittranspose is true, this variable specifies the minimal reduction in the number of tuples of the outer input of the join or input of the union that the optimizer should get in order to apply the rule. 当hive.optimize.limittranspose为true时,此变量指定优化程序为了应用规则而应该获得的联合或联合输入的外部输入的元组数量的最小减少。 |
hive.optimize.filter.stats.reduction |
false |
Hive 2.1.0 with HIVE-13269 |
Whether to simplify comparison expressions in filter operators using column stats. 是否使用列统计量简化过滤器运算符中的比较表达式。 |
hive.optimize.sort.dynamic.partition |
and later (HIVE-8151) |
Hive 0.13.0 with HIVE-6455 |
When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers. 启用时,动态分区列将全局排序。通过这种方式,我们只能在reducer中为每个分区值保留一个记录写入程序,从而减少对reducer的内存压力。 |
hive.cbo.enable |
true in Hive 1.1.0 and later (HIVE-8395) |
Hive 0.14.0 with HIVE-5775 and HIVE-7946 |
When true, the cost based optimizer, which uses the Calcite framework, will be enabled. 如果为true,则将启用使用Calcite框架的基于成本的优化器。 |
hive.cbo.returnpath.hiveop |
false | Hive 1.2.0 with HIVE-9581 and HIVE-9795 |
When true, this optimization to CBO Logical plan will add rule to introduce not null filtering on join keys. Controls Calcite plan to Hive operator conversion. Overrides hive.optimize.remove.identity.project when set to false. 如果为true,则对CBO逻辑计划的此优化将添加规则,以在连接键上引入不空过滤。控制方案计划Hive操作员转换。设置为false时覆盖hive.optimize.remove.identity.project。 |
hive.cbo.cnf.maxnodes |
-1 | Hive 2.1.1 with HIVE-14021 |
When converting to conjunctive normal form (CNF), fail if the expression exceeds the specified threshold; the threshold is expressed in terms of the number of nodes (leaves and interior nodes). The default, -1, does not set up a threshold. 当转换为连接范式(CNF)时,如果表达式超过指定的阈值,则失败;阈值用节点(叶子和内部节点)的数量表示。默认值-1不设置阈值。 |
hive.optimize.null.scan |
true | Hive 0.14.0 with HIVE-7385 |
When true, this optimization will try to not scan any rows from tables which can be determined at query compile time to not generate any rows (e.g., where 1 = 2, where false, limit 0 etc.). 如果为true,则此优化将尝试不扫描可在查询编译时确定的表中的任何行,以不生成任何行(例如,其中1 = 2,其中false,限制0等)。 |
hive.exec.dynamic.partition |
true in Hive 0.9.0 and later |
Hive 0.6.0 |
Whether or not to allow dynamic partitions in DML/DDL. 是否允许在DML / DDL中使用动态分区。 |
hive.exec.dynamic.partition.mode |
strict | Hive 0.6.0 |
In 在严格模式下,用户必须指定至少一个静态分区,以防用户意外覆盖所有分区。在非严格模式下,所有分区都是动态的。 |
hive.exec.max.dynamic.partitions |
1000 | Hive 0.6.0 |
Maximum number of dynamic partitions allowed to be created in total. 允许创建的最大动态分区总数。 |
hive.exec.max.dynamic.partitions.pernode |
100 | Hive 0.6.0 |
Maximum number of dynamic partitions allowed to be created in each mapper/reducer node. 允许在每个 mapper/reducer 节点中创建的最大动态分区数量。 |
hive.exec.max.created.files |
100000 | Hive 0.7.0 |
Maximum number of HDFS files created by all mappers/reducers in a MapReduce job. MapReduce作业中所有mappers/reducers创建的HDFS文件的最大数量。 |
hive.exec.default.partition.name |
__HIVE_DEFAULT_PARTITION__ |
Hive 0.6.0 |
The default partition name in case the dynamic partition column value is null/empty string or any other values that cannot be escaped. This value must not contain any special character used in HDFS URI (e.g., ':', '%', '/' etc). The user has to be aware that the dynamic partition value should not contain this value to avoid confusions. 动态分区列值为空/空字符串或任何其他不能转义的值时的默认分区名称。该值不得包含HDFS URI中使用的任何特殊字符(例如':','%','/'等)。用户必须意识到动态分区值不应包含此值以避免混淆。 |
hive.fetch.output.serde |
org.apache.hadoop.hive .serde2.DelimitedJSONSerDe |
Hive 0.7.0 |
The SerDe used by FetchTask to serialize the fetch output. SerDe使用SerDe来序列化读取输出。 |
hive.exec.mode.local.auto |
false |
Hive 0.7.0 with HIVE-1408 |
Lets Hive determine whether to run in local mode automatically. 让Hive确定是否以本地模式自动运行。 |
hive.exec.mode.local.auto.inputbytes.max |
134217728 | Hive 0.7.0 with HIVE-1408 |
When hive.exec.mode.local.auto is true, input bytes should be less than this for local mode. 当hive.exec.mode.local.auto为true时,对于本地模式,输入字节应该小于此值。 |
hive.exec.mode.local.auto.input.files.max |
4 | Hive 0.9.0 with HIVE-2651 |
When hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode. 当hive.exec.mode.local.auto为true时,本地模式下的任务数应该小于此值。 |
hive.exec.drop.ignorenonexistent |
true | Hive 0.7.0 with HIVE-1856 and HIVE-1858 |
Do not report an error if DROP TABLE/VIEW/PARTITION/INDEX/TEMPORARY FUNCTION specifies a non-existent table/view. Also applies to permanent functionsas of Hive 0.13.0. 如果DROP TABLE / VIEW / PARTITION / INDEX / TEMPORARY FUNCTION指定不存在的表/视图,则不报告错误。也适用于Hive 0.13.0的永久功能。 |
hive.exec.show.job.failure.debug.info |
true | Hive 0.7.0 |
If a job fails, whether to provide a link in the CLI to the task with the most failures, along with debugging hints if applicable. 如果作业失败,是否将CLI中的链接提供给发生故障最多的任务,同时提供调试提示(如果适用)。 |
hive.auto.progress.timeout |
0 | Hive 0.7.0 |
How long to run autoprogressor for the script/UDTF operators (in seconds). Set to 0 for forever. 为脚本/ UDTF操作员运行自动搜索器需要多长时间(以秒为单位)。永远设为0。 |
hive.table.parameters.default |
(empty) | Hive 0.7.0 |
Default property values for newly created tables. 新建表的默认属性值。 |
hive.variable.substitute |
true | Hive 0.7.0 | This enables substitution using syntax like ${var } ${system:var } and ${env:var }. |
hive.error.on.empty.partition |
false |
Hive 0.7.0 |
Whether to throw an exception if dynamic partition insert generates empty results. 如果动态分区插入生成空结果,是否抛出异常。 |
hive.exim.uri.scheme.whitelist |
hdfs,pfile,file in Hive 2.2.0 and later |
default changed in Hive 2.2.0 with HIVE-15151 |
A comma separated list of acceptable URI schemes for import and export. 用于导入和导出的可接受URI方案的逗号分隔列表。 |
hive.limit.row.max.size |
100000 | Hive 0.8.0 |
When trying a smaller subset of data for simple LIMIT, how much size we need to guarantee each row to have at least. 当为简单的LIMIT尝试更小的数据子集时,我们需要多少大小来保证每行至少有一个。 |
hive.limit.optimize.limit.file |
10 | Hive 0.8.0 |
When trying a smaller subset of data for simple LIMIT, maximum number of files we can sample. 当为简单的LIMIT尝试更小的数据子集时,我们可以采样的文件数量最多。 |
hive.limit.optimize.enable |
false | Hive 0.8.0 |
Whether to enable to optimization to trying a smaller subset of data for simple LIMIT first. 是否启用优化,以便首先尝试更小的数据子集以获得简单的LIMIT。 |
hive.limit.optimize.fetch.max |
50000 | Hive 0.8.0 |
Maximum number of rows allowed for a smaller subset of data for simple LIMIT, if it is a fetch query. Insert queries are not restricted by this limit. 如果是提取查询,则允许简单LIMIT数据的较小子集的最大行数。插入查询不受此限制的限制。 |
hive.rework.mapredwork |
false | Hive 0.8.0 |
Should rework the mapred work or not. This is first introduced by SymlinkTextInputFormat to replace symlink files with real paths at compile time. 是否应该返工mapred工作。这首先由SymlinkTextInputFormat引入,以在编译时用实际路径替换符号链接文件。 |
hive.sample.seednumber |
0 | Hive 0.8.0 |
A number used to percentage sampling. By changing this number, user will change the subsets of data sampled. 一个数字用于百分比抽样。通过改变这个数字,用户将改变采样数据的子集。 |
hive.autogen.columnalias.prefix.label |
_c |
Hive 0.8.0 |
String used as a prefix when auto generating column alias. By default the prefix label will be appended with a column position number to form the column alias. Auto generation would happen if an aggregate function is used in a select clause without an explicit alias. 自动生成列别名时用作前缀的字符串。默认情况下,前缀标签将附加一个列位置编号以形成列别名。如果在没有显式别名的select子句中使用聚合函数,则会自动生成。 |
hive.autogen.columnalias.prefix.includefuncname |
false | Hive 0.8.0 |
Whether to include function name in the column alias auto generated by Hive. 是否将函数名称包含在由Hive自动生成的列别名中。 |
hive.exec.perf.logger |
org.apache.hadoop.hive .ql.log.PerfLogger |
Hive 0.8.0 |
The class responsible logging client side performance metrics. Must be a subclass of org.apache.hadoop.hive.ql.log.PerfLogger. 负责记录客户端性能指标的类。必须是org.apache.hadoop.hive.ql.log.PerfLogger的子类。 |
hive.start.cleanup.scratchdir |
false | Hive 1.3.0 with HIVE-10415 |
To clean up the Hive scratch directory while starting the Hive server (or HiveServer2). This is not an option for a multi-user environment since it will accidentally remove the scratch directory in use. 在启动Hive服务器(或HiveServer2)时清理Hive暂存目录。这不是多用户环境的选项,因为它会意外地删除正在使用的暂存目录。 |
hive.scratchdir.lock |
false | Hive 1.3.0 and 2.1.0 (but not 2.0.x) with HIVE-13429 |
When true, holds a lock file in the scratch directory. If a Hive process dies and accidentally leaves a dangling scratchdir behind, the cleardanglingscratchdir tool will remove it.When false, does not create a lock file and therefore the cleardanglingscratchdir tool cannot remove any dangling scratch directories. 如果为true,则在暂存目录中保存一个锁定文件。如果一个Hive进程死了,意外地把一个悬挂的scratchdir留在后面,那么cleardanglingscratchdir工具将会把它移除。如果false,不会创建一个锁文件,因此cleardanglingscratchdir工具不能删除任何悬挂的scratch目录。 |
hive.output.file.extension |
(empty) | Hive 0.8.1 |
String used as a file extension for output files. If not set, defaults to the codec extension for text files (e.g. ".gz"), or no extension otherwise. 用作输出文件的文件扩展名的字符串。如果未设置,则默认为文本文件的编解码器扩展名(例如“.gz”),否则为无扩展名。 |
hive.insert.into.multilevel.dirs |
false | Hive 0.8.1 |
Whether to insert into multilevel nested directories like "insert directory '/HIVEFT25686/chinna/' from table". 是否插入多级嵌套目录,如“插入目录”/ HIVEFT25686 / chinna /'从表“。 |
hive.conf.validation |
true | Hive 0.10.0 with HIVE-2848 |
Enables type checking for registered Hive configurations. 为注册的Hive配置启用类型检查。 |
hive.fetch.task.conversion |
more in Hive 0.14.0 and later |
Hive 0.14.0 with HIVE-7397 |
Some select queries can be converted to a single FETCH task, minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incur RS – ReduceSinkOperator, requiring a MapReduce task), lateral views and joins. 一些select查询可以转换为一个FETCH任务,最大限度地减少延迟。目前查询应该是单一来源,没有任何子查询,不应该有任何聚合或区别(这会产生RS - ReduceSinkOperator,需要一个MapReduce任务),横向视图和连接。 Supported values are none, 0. " |
hive.map.groupby.sorted |
Hive 2.0 and later: true (HIVE-12325) |
Hive 0.10.0 with HIVE-3432 |
If the bucketing/sorting properties of the table exactly match the grouping key, whether to perform the group by in the mapper by using BucketizedHiveInputFormat. The only downside to this is that it limits the number of mappers to the number of files. 如果表的桶/排序属性与分组键完全匹配,是否使用BucketizedHiveInputFormat在映射器中执行组。唯一的缺点是它限制了映射器的数量到文件的数量。 |
hive.groupby.orderby.position.alias |
false | Hive 0.11.0 with HIVE-581 |
Whether to enable using Column Position Alias in GROUP BY and ORDER BY clauses of queries (deprecated as of Hive 2.2.0; use hive.groupby.position.alias and hive.orderby.position.alias instead). 是否启用在GROUP BY和ORDER BY子句中使用列位置别名(不推荐使用Hive 2.2.0;改为使用hive.groupby.position.alias和hive.orderby.position.alias)。 |
hive.groupby.position.alias |
false | Hive 2.2.0 with HIVE-15797 |
Whether to enable using Column Position Alias in GROUP BY. 是否在GROUP BY中启用列位置别名 |
hive.orderby.position.alias |
true | Hive 2.2.0 with HIVE-15797 |
Whether to enable using Column Position Alias in ORDER BY. 是否在ORDER BY中启用使用列位置别名。 |
hive.fetch.task.aggr |
false | Hive 0.12.0 with HIVE-4002 |
Aggregation queries with no group-by clause (for example, 不带分组的子句的聚合查询(例如,select count(*) from src)在单个reduce任务中执行最终聚合。如果此参数设置为true,则Hive会将最终聚合阶段委托给一个提取任务,可能会缩短查询时间。 |
hive.fetch.task.conversion.threshold |
in Hive 0.13.0 and 0.13.1, 1073741824 (1 GB) in Hive 0.14.0 and later |
Hive 0.13.0 |
Input threshold (in bytes) for applying hive.fetch.task.conversion. If target table is native, input length is calculated by summation of file lengths. If it's not native, the storage handler for the table can optionally implement the org.apache.hadoop.hive.ql.metadata.InputEstimator interface. A negative threshold means hive.fetch.task.conversion is applied without any input length threshold. 用于应用hive.fetch.task.conversion的输入阈值(以字节为单位)。如果目标表是本地的,则输入长度是通过文件长度的总和来计算的。如果不是本地的,那么表的存储处理器可以选择实现org.apache.hadoop.hive.ql.metadata.InputEstimator接口。否定阈值意味着应用hive.fetch.task.conversion时没有任何输入长度阈值。 |
hive.limit.pushdown.memory.usage |
-1 |
Hive 0.12.0 with HIVE-3562 |
The maximum memory to be used for hash in RS operator for top K selection. The default value "-1" means no limit. 用于顶层K选择的RS运算符中用于散列的最大内存。默认值“-1”表示没有限制。 |
hive.cache.expr.evaluation |
true | Hive 0.12.0 with HIVE-4209 |
If true, the evaluation result of a deterministic expression referenced twice or more will be cached. For example, in a filter condition like "... where key + 10 > 10 or key + 10 = 0" the expression "key + 10" will be evaluated/cached once and reused for the following expression ("key + 10 = 0"). Currently, this is applied only to expressions in select or filter operators. 如果为真,则将被高速缓存两次或更多次引用的确定性表达式的评估结果。例如,在“...其中键+ 10>或键+ 10 = 0”的过滤条件中,表达式“键+ 10”将被评估/高速缓存一次,并被重新用于下面的表达式(“键+ 10 = 0' )。目前,这只适用于选择或过滤器运算符中的表达式。 |
hive.resultset.use.unique.column.names |
true |
Hive 0.13.0 with HIVE-6687 |
Make column names unique in the result set by qualifying column names with table alias if needed. Table alias will be added to column names for queries of type "select *" or if query explicitly uses table alias "select r1.x..". 如果需要,通过限定具有表别名的列名使结果集中的列名唯一。表别名将被添加到类型为“select *”的查询的列名称中,或者如果查询显式使用表别名“select r1.x ..”。 |
hive.support.quoted.identifiers |
column | Hive 0.13.0 with HIVE-6013 |
Whether to use quoted identifiers. Value can be " 是否使用带引号的标识符。值可以是“无”或“列”。 |
hive.plan.serialization.format |
kryo | Hive 0.13.0 with HIVE-1511 |
Query plan format serialization between client and task nodes. Two supported values are 客户端和任务节点之间的查询计划格式序列化。两个支持的值是kryo和javaXML。 Kryo是默认的。 |
hive.exec.check.crossproducts |
true | Hive 0.13.0 with HIVE-6643 |
Check if a query plan contains a cross product. If there is one, output a warning to the session's console. 检查查询计划是否包含跨产品。如果有,向会话控制台输出警告。 |
hive.display.partition.cols.separately |
true | Hive 0.13.0 with HIVE-6689 |
In older Hive versions (0.10 and earlier) no distinction was made between partition columns or non-partition columns while displaying columns in DESCRIBE TABLE. From version 0.12 onwards, they are displayed separately. This flag will let you get the old behavior, if desired. See test-case in patch for HIVE-6689. 在较早的Hive版本(0.10及更早版本)中,在DESCRIBE TABLE中显示列时,分区列或非分区列之间没有区别。从版本0.12开始,它们分开显示。如果需要,这个标志会让你得到旧的行为。在补丁中查看HIVE-6689的测试案例。 |
hive.optimize.sampling.orderby |
false | Hive 0.12.0 with HIVE-1402 |
Uses sampling on order-by clause for parallel execution. 使用order by子句进行并行执行采样。 |
hive.optimize.sampling.orderby.number |
1000 | Hive 0.12.0 with HIVE-1402 |
With hive.optimize.sampling.orderby=true, total number of samples to be obtained to calculate partition keys. 通过hive.optimize.sampling.orderby = true,可以获得计算分区键的样本总数。 |
hive.optimize.sampling.orderby.percent |
0.1 | Hive 0.12.0 with HIVE-1402 |
With hive.optimize.sampling.orderby=true, probability with which a row will be chosen. hive.optimize.sampling.orderby = true,将选择一行的概率。 |
hive.compat |
0.12 | Hive 0.13.0 with HIVE-6012 |
Enable (configurable) deprecated behaviors of arithmetic operations by setting the desired level of backward compatibility. The default value gives backward-compatible return types for numeric operations. Other supported release numbers give newer behavior for numeric operations, for example 0.13 gives the more SQL compliant return types introduced in HIVE-5356. 通过设置所需级别的向后兼容性来启用(可配置的)算术运算的弃用行为。默认值为数字操作提供向后兼容的返回类型。其他支持的版本号为数字操作提供了更新的行为,例如0.13给出了HIVE-5356中引入的更多SQL兼容返回类型。 |
hive.optimize.constant.propagation |
true | Hive 0.14.0 with HIVE-5771 |
Whether to enable the constant propagation optimizer. 是否启用常量传播优化器。 |
hive.entity.capture.transform |
false | Hive 1.1.0 with HIVE-8938 |
Enable capturing compiler read entity of transform URI which can be introspected in the semantic and exec hooks. 启用捕获可以在语义和可执行钩子中自省的转换URI的编译器读取实体。 |
hive.support.sql11.reserved.keywords |
true | Hive 1.2.0 with HIVE-6617 |
Whether to enable support for SQL2011 reserved keywords. When enabled, will support (part of) SQL2011 reserved keywords. 是否启用对SQL2011保留关键字的支持。启用时,将支持(部分)SQL2011保留关键字。 |
hive.log.explain.output |
false | 1.1.0 with HIVE-8600 |
When enabled, will log EXPLAIN EXTENDED output for the query at log4j INFO level and in WebUI / Drilldown / Query Plan. 启用后,将在log4j INFO级别和WebUI /钻取/查询计划中记录EXPLAIN EXTENDED输出。 |
hive.explain.user |
false | Hive 1.2.0 with HIVE-9780 |
Whether to show explain result at user level. When enabled, will log EXPLAIN output for the query at user level. (Tez only. For Spark, see hive.spark.explain.user.) 是否在用户级别显示解释结果。启用后,将在用户级别为查询记录EXPLAIN输出。 (仅限于Tez,对于Spark,请参阅hive.spark.explain.user。) |
hive.typecheck.on.insert |
true | Hive 0.12.0 with HIVE-5297 for insert partition |
Whether to check, convert, and normalize partition value specified in partition specification to conform to the partition column type. 是否检查,转换和归一化分区规范中指定的分区值以符合分区列类型。 |
hive.exec.temporary.table.storage |
default | Hive 1.1.0 with HIVE-7313 |
Define the storage policy for temporary tables. Choices between memory, ssd and default. See HDFS Storage Types and Storage Policies. 定义临时表的存储策略。内存,ssd和默认值之间的选择。请参阅HDFS存储类型和存储策略。 |
hive.optimize.distinct.rewrite |
true | Hive 1.2.0 with HIVE-10568 |
When applicable, this optimization rewrites distinct aggregates from a single-stage to multi-stage aggregation. This may not be optimal in all cases. Ideally, whether to trigger it or not should be a cost-based decision. Until Hive formalizes the cost model for this, this is config driven. 在适用的情况下,此优化会将不同的聚合从单阶段聚合重写为多阶段聚合。在所有情况下,这可能不是最佳的。理想情况下,是否触发它应该是基于成本的决定。在Hive正式确定成本模型之前,这是配置驱动的。 |
hive.optimize.point.lookup |
true | Hive 2.0.0 with HIVE-11461 |
Whether to transform OR clauses in Filter operators into IN clauses. 是否将Filter运算符中的OR子句转换为IN子句。 |
hive.optimize.point.lookup.min |
31 | Hive 2.0.0 with HIVE-11573 |
Minimum number of OR clauses needed to transform into IN clauses. 需要转换成IN子句的最少OR子句数。 |
hive.allow.udf.load.on.demand |
false | Hive 2.1.0 with HIVE-13596 |
Whether enable loading UDFs from metastore on demand; this is mostly relevant for HS2 and was the default behavior before Hive 1.2. 是否能够根据需要从元存储装载UDF;这主要与HS2相关,并且是Hive 1.2之前的默认行为。 |
hive.async.log.enabled |
true | Hive 2.1.0 with HIVE-13027 |
Whether to enable Log4j2's asynchronous logging. Asynchronous logging can give significant performance improvement as logging will be handled in a separate thread that uses the LMAX disruptor queue for buffering log messages. 是否启用Log4j2的异步日志记录。异步日志记录可以显着提高性能,因为日志记录将在使用LMAX干扰程序队列缓冲日志消息的单独线程中处理。 |
hive.msck.repair.batch.size |
0 | Hive 2.2.0 with HIVE-12077 |
To run the MSCK REPAIR TABLE command batch-wise. If there is a large number of untracked partitions, by configuring a value to the property it will execute in batches internally. The default value of the property is zero, which means it will execute all the partitions at once. 分批运行MSCK REPAIR TABLE命令。如果存在大量未跟踪的分区,则通过为该属性配置一个值,它将在内部批量执行。该属性的默认值是零,这意味着它将一次执行所有的分区。 |
hive.exec.copyfile.maxnumfiles |
1 | Hive 2.3.0 with HIVE-14864 |
Maximum number of files Hive uses to do sequential HDFS copies between directories. Distributed copies (distcp) will be used instead for larger numbers of files so that copies can be done faster. Hive用来在目录之间执行顺序HDFS副本的最大文件数。分布式副本(distcp)将用于更大数量的文件,以便可以更快地完成副本。 |
hive.exec.copyfile.maxsize |
32 megabytes | Hive 1.1.0 with HIVE-8750 |
Maximum file size (in bytes) that Hive uses to do single HDFS copies between directories. Distributed copies (distcp) will be used instead for bigger files so that copies can be done faster. Hive用于在目录之间执行单个HDFS副本的最大文件大小(以字节为单位)。分布式副本(distcp)将用于更大的文件,以便可以更快地完成副本。 |
hive.exec.stagingdir |
hive-staging | Hive 1.1.0 with HIVE-8750 |
Directory name that will be created inside table locations in order to support HDFS encryption. This is replaces 将在表格位置内创建的目录名称,以支持HDFS加密。除了只读表之外,这将替换查询结果的hive.exec.scratchdir。在所有情况下,hive.exec.scratchdir仍然用于其他临时文件,例如作业计划。 |
hive.query.lifetime.hooks |
(empty) | Hive 2.3.0 with HIVE-14340 |
A comma separated list of hooks which implement QueryLifeTimeHook. These will be triggered before/after query compilation and before/after query execution, in the order specified. As of Hive 3.0.0 (HIVE-16363), this config can be used to specify implementations of QueryLifeTimeHookWithParseHooks. If they are specified then they will be invoked in the same places as QueryLifeTimeHooks and will be invoked during pre and post query parsing. 用逗号分隔的实现QueryLifeTimeHook的钩子列表。这些将在查询编译之前/之后和查询执行之前/之后以指定的顺序触发。从Hive 3.0.0(HIVE-16363)开始,这个配置可以用来指定QueryLifeTimeHookWithParseHooks的实现。如果它们被指定,那么它们将在与QueryLifeTimeHooks相同的地方被调用,并且将在前后查询分析期间被调用。 |
hive.remove.orderby.in.subquery |
true | Hive 3.0.0 with HIVE-6348 |
If set to true, order/sort by without limit in subqueries and views will be removed. 如果设置为true,则在子查询和视图中排序/排序将被删除。 |
2、SerDes and I/O
2.1 SerDes
属性 | 默认值 | 更新版本 | 属性说明 |
---|---|---|---|
hive.script.serde |
org.apache.hadoop .hive.serde2.lazy .LazySimpleSerDe |
Hive 0.4.0 |
The default SerDe for transmitting input data to and reading output data from the user scripts. 用于将输入数据传输到用户脚本并从用户脚本读取输出数据的默认SerDe。 |
hive.script.recordreader |
org.apache.hadoop .hive.ql.exec .TextRecordReader |
Hive 0.4.0 |
The default record reader for reading data from the user scripts. 用于从用户脚本读取数据的默认记录阅读器。 |
hive.script.recordwriter |
org.apache.hadoop .hive.ql.exec .TextRecordWriter |
Hive 0.5.0 |
The default record writer for writing data to the user scripts. 用于将数据写入用户脚本的默认记录写入器。 |
hive.default.serde |
org.apache.hadoop .hive.serde2.lazy .LazySimpleSerDe |
Hive 0.14 with HIVE-5976 |
The default SerDe Hive will use for storage formats that do not specify a SerDe. Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'. See Registration of Native SerDes for more information for storage formats and SerDes. 默认的SerDe Hive将用于不指定SerDe的存储格式。目前没有指定SerDe的存储格式包括“TextFile,RcFile”。 |
hive.lazysimple.extended_boolean_literal |
false | Hive 0.14 with HIVE-3635 | LazySimpleSerDe uses this property to determine if it treats 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals, in addition to 'TRUE' and 'FALSE'. The default is false , which means only 'TRUE' and 'FALSE' are treated as legal boolean literals. |
2.2 I/O
属性 | 默认值 | 更新版本 | 属性说明 |
---|---|---|---|