[Hive] - Hive参数含义详解
hive中参数分为三类,第一种system环境变量信息,是系统环境变量信息;第二种是env环境变量信息,是当前用户环境变量信息;第三种是hive参数变量信息,是由hive-site.xml文件定义的以及当前hive会话定义的环境变量信息。其中第三种hive参数变量信息中又由hadoop hdfs参数(直接是hadoop的)、mapreduce参数、metastore元数据存储参数、metastore连接参数以及hive运行参数构成。
参数 | 默认值 | 含义(用处) |
datanucleus.autoCreateSchema | true | creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once;如果数据元数据不存在,那么直接创建,如果设置为false,那么在之后创建。 |
datanucleus.autoStartMechanismMode | checked | throw exception if metadata tables are incorrect;如果数据元信息检查失败,抛出异常。可选value: checked, unchecked |
datanucleus.cache.level2 | false | Use a level 2 cache. Turn this off if metadata is changed independently of Hive metastore server; 是否使用二级缓存机制。 |
datanucleus.cache.level2.type | SOFT | SOFT=soft reference based cache, WEAK=weak reference based cache, none=no cache.二级缓存机制的类型,none是不使用,SOFT表示使用软引用,WEAK表示使用弱引用。 |
datanucleus.connectionPoolingType | BoneCP | metastore数据连接池使用。 |
datanucleus.fixedDatastore | false | |
datanucleus.identifierFactory | datanucleus1 | Name of the identifier factory to use when generating table/column names etc.创建metastore数据库的工厂类。 |
datanucleus.plugin.pluginRegistryBundleCheck | LOG | Defines what happens when plugin bundles are found and are duplicated [EXCEPTION|LOG|NONE] |
datanucleus.rdbms.useLegacyNativeValueStrategy | true | |
datanucleus.storeManagerType | rdbms | 元数据存储方式 |
datanucleus.transactionIsolation | read-committed | 事务机制,Default transaction isolation level for identity generation. |
datanucleus.validateColumns | false | validates existing schema against code. turn this on if you want to verify existing schema,对于存在的表是否进行检查schema |
datanucleus.validateConstraints | false | 对于存在的表是否检查约束 |
datanucleus.validateTables | false | 检查表 |
dfs.block.access.key.update.interval | 600 | |
hive.archive.enabled | false | Whether archiving operations are permitted;是否允许进行归档操作。 |
hive.auto.convert.join | true | Whether Hive enables the optimization about converting common join into mapjoin based on the input file size;是否允许进行data join 优化 |
hive.auto.convert.join.noconditionaltask | true |
Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the specified size, the join is directly converted to a mapjoin (there is no conditional task).针对没有条件的task,是否直接使用data join。 |
hive.auto.convert.join.noconditionaltask.size | 10000000 | If hive.auto.convert.join.noconditionaltask is off, this parameter does not take affect. However, if it is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than this size, the join is directly converted to a mapjoin(there is no conditional task). The default is 10MB;如果${hive.auto.convert.join.noconditionaltask}设置为true,那么表示控制文件的大小值,默认10M;也就是说如果小于10M,那么直接使用data join。 |
hive.auto.convert.join.use.nonstaged | false | For conditional joins, if input stream from a small alias can be directly applied to join operator without filtering or projection, the alias need not to be pre-staged in distributed cache via mapred local task. Currently, this is not working with vectorization or tez execution engine.对于有条件的数据join,对于小文件是否使用分布式缓存。 |
hive.auto.convert.sortmerge.join | false | Will the join be automatically converted to a sort-merge join, if the joined tables pass the criteria for sort-merge join.如果可以转换,自动转换为标准的sort-merge join方式。 |
hive.auto.convert.sortmerge.join.bigtable.selection.policy | org.apache.hadoop.hive.ql.optimizer.AvgPartitionSizeBasedBigTableSelectorForAutoSMJ | |
hive.auto.convert.sortmerge.join.to.mapjoin | false | 是否穿件sort-merge join到map join方式 |
hive.auto.progress.timeout | 0 | How long to run autoprogressor for the script/UDTF operators (in seconds). Set to 0 for forever. 执行脚本和udtf过期时间,设置为0表示永不过期。 |
hive.autogen.columnalias.prefix.includefuncname | false | hive自动产生的临时列名是否加function名称,默认不加 |
hive.autogen.columnalias.prefix.label | _c | hive的临时列名主体部分 |
hive.binary.record.max.length | 1000 | hive二进制记录最长长度 |
hive.cache.expr.evaluation | true | If true, evaluation result of deterministic expression referenced twice or more will be cached. For example, in filter condition like ".. where key + 10 > 10 or key + 10 = 0" "key + 10" will be evaluated/cached once and reused for following expression ("key + 10 = 0"). Currently, this is applied only to expressions in select or filter operator. 是否允许缓存表达式的执行,默认允许;先阶段只缓存select和where中的表达式结果。 |
hive.cli.errors.ignore | false | |
hive.cli.pretty.output.num.cols | -1 | |
hive.cli.print.current.db | false | 是否显示当前操作database名称,默认不显示 |
hive.cli.print.header | false | 是否显示具体的查询头部信息,默认不显示。比如不显示列名。 |
hive.cli.prompt | hive | hive的前缀提示信息,,修改后需要重新启动客户端。 |
hive.cluster.delegation.token.store.class | org.apache.hadoop.hive.thrift.MemoryTokenStore | hive集群委托token信息存储类 |
hive.cluster.delegation.token.store.zookeeper.znode | /hive/cluster/delegation | hive zk存储 |
hive.compactor.abortedtxn.threshold | 1000 | 分区压缩文件阀值 |
hive.compactor.check.interval | 300 | 压缩间隔时间,单位秒 |
hive.compactor.delta.num.threshold | 10 | 子分区阀值 |
hive.compactor.delta.pct.threshold | 0.1 | 压缩比例 |
hive.compactor.initiator.on | false | |
hive.compactor.worker.threads | 0 | |
hive.compactor.worker.timeout | 86400 | 单位秒 |
hive.compat | 0.12 | 兼容版本信息 |
hive.compute.query.using.stats | false | |
hive.compute.splits.in.am | true | |
hive.conf.restricted.list | hive.security.authenticator.manager,hive.security.authorization.manager | |
hive.conf.validation | true | |
hive.convert.join.bucket.mapjoin.tez | false | |
hive.counters.group.name | HIVE | |
hive.debug.localtask | false | |
hive.decode.partition.name | false | |
hive.default.fileformat | TextFile | 指定默认的fileformat格式化器。默认为textfile。 |
hive.default.rcfile.serde | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe | rcfile对应的序列化类 |
hive.default.serde | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | 默认的序列化类 |
hive.display.partition.cols.separately | true | hive分区单独的显示列名 |
hive.downloaded.resources.dir | /tmp/${hive.session.id}_resources | hive下载资源存储文件 |
hive.enforce.bucketing | false | 是否允许使用桶 |
hive.enforce.bucketmapjoin | false | 是否允许桶进行map join |
hive.enforce.sorting | false | 是否允许在插入的时候使用sort排序。 |
hive.enforce.sortmergebucketmapjoin | false | |
hive.entity.capture.transform | false | |
hive.entity.separator | @ | Separator used to construct names of tables and partitions. For example, dbname@tablename@partitionname |
hive.error.on.empty.partition | false | Whether to throw an exception if dynamic partition insert generates empty results.当启用动态hive的时候,如果插入的partition为空,是否抛出异常信息。 |
hive.exec.check.crossproducts | true | 检查是否包含向量积 |
hive.exec.compress.intermediate | false | 中间结果是否压缩,压缩机制采用hadoop的配置信息mapred.output.compress* |
hive.exec.compress.output | false | 最终结果是否压缩 |
hive.exec.concatenate.check.index | true | |
hive.exec.copyfile.maxsize | 33554432 | |
hive.exec.counters.pull.interval | 1000 | |
hive.exec.default.partition.name | __HIVE_DEFAULT_PARTITION__ | |
hive.exec.drop.ignorenonexistent | true | 当执行删除的时候是否忽略不存在的异常信息,默认忽略,如果忽略,那么会报错。 |
hive.exec.dynamic.partition | true | 是否允许动态指定partition,如果允许的话,那么我们修改内容的时候可以不指定partition的值。 |
hive.exec.dynamic.partition.mode | strict | 动态partition模式,strict模式要求至少给定一个静态的partition值。nonstrict允许全部partition为动态的值。 |
hive.exec.infer.bucket.sort | false | |
hive.exec.infer.bucket.sort.num.buckets.power.two | false | |
hive.exec.job.debug.capture.stacktraces | true | |
hive.exec.job.debug.timeout | 30000 | |
hive.exec.local.scratchdir | /tmp/hadoop | |
hive.exec.max.created.files | 100000 | 在mr程序中最大创建的hdfs文件个数 |
hive.exec.max.dynamic.partitions | 1000 | 动态分区的总的分区最大个数 |
hive.exec.max.dynamic.partitions.pernode | 100 | 每个MR节点的最大创建个数 |
hive.exec.mode.local.auto | false | 是否允许hive运行本地模式 |
hive.exec.mode.local.auto.input.files.max | 4 | hive本地模式最大输入文件数量 |
hive.exec.mode.local.auto.inputbytes.max | 134217728 | hive本地模式组大输入字节数 |
hive.exec.orc.default.block.padding | true | |
hive.exec.orc.default.buffer.size | 262144 | |
hive.exec.orc.default.compress | ZLIB | |
hive.exec.orc.default.row.index.stride | 10000 | |
hive.exec.orc.default.stripe.size | 268435456 | |
hive.exec.orc.dictionary.key.size.threshold | 0.8 | |
hive.exec.orc.memory.pool | 0.5 | |
hive.exec.orc.skip.corrupt.data | false | |
hive.exec.orc.zerocopy | false | |
hive.exec.parallel | false | 是否允许并行执行,默认不允许。 |
hive.exec.parallel.thread.number | 8 | 并行执行线程个数,默认8个。 |
hive.exec.perf.logger | org.apache.hadoop.hive.ql.log.PerfLogger | |
hive.exec.rcfile.use.explicit.header | true | |
hive.exec.rcfile.use.sync.cache | true | |
hive.exec.reducers.bytes.per.reducer | 1000000000 | size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers. 默认reducer节点处理数据的规模,默认1G。 |
hive.exec.reducers.max | 999 | reducer允许的最大个数。当mapred.reduce.tasks指定为负值的时候,该参数起效。 |
hive.exec.rowoffset | false | |
hive.exec.scratchdir | /etc/hive-hadoop | |
hive.exec.script.allow.partial.consumption | false | |
hive.exec.script.maxerrsize | 100000 | |
hive.exec.script.trust | false | |
hive.exec.show.job.failure.debug.info | true | |
hive.exec.stagingdir | .hive-staging | |
hive.exec.submitviachild | false | |
hive.exec.tasklog.debug.timeou | 20000 | |
hive.execution.engine | mr | 执行引擎mr或者Tez(hadoop2) |
hive.exim.uri.scheme.whitelist | hdfs,pfile | |
hive.explain.dependency.append.tasktype | false | |
hive.fetch.output.serde | org.apache.hadoop.hive.serde2.DelimitedJSONSerDe | |
hive.fetch.task.aggr | false | |
hive.fetch.task.conversion | minimal | |
hive.fetch.task.conversion.threshold | -1 | |
hive.file.max.footer | 100 | |
hive.fileformat.check | true | |
hive.groupby.mapaggr.checkinterval | 100000 | |
hive.groupby.orderby.position.alias | false | |
hive.groupby.skewindata | false | |
hive.hadoop.supports.splittable.combineinputformat | false | |
hive.hashtable.initialCapacity | 100000 | |
hive.hashtable.loadfactor | 0.75 | |
hive.hbase.generatehfiles | false | |
hive.hbase.snapshot.restoredir | /tmp | |
hive.hbase.wal.enabled | true | |
hive.heartbeat.interval | 1000 | |
hive.hmshandler.force.reload.conf | false | |
hive.hmshandler.retry.attempts | 1 | |
hive.hmshandler.retry.interval | 1000 | |
hive.hwi.listen.host | 0.0.0.0 | |
hive.hwi.listen.port | 9999 | |
hive.hwi.war.file | lib/hive-hwi-${version}.war | |
hive.ignore.mapjoin.hint | true | |
hive.in.test | false | |
hive.index.compact.binary.search | true | |
hive.index.compact.file.ignore.hdfs | false | |
hive.index.compact.query.max.entries | 10000000 | |
hive.index.compact.query.max.size | 10737418240 | |
hive.input.format | org.apache.hadoop.hive.ql.io.CombineHiveInputFormat | |
hive.insert.into.external.tables | true | |
hive.insert.into.multilevel.dirs | false | |
hive.jobname.length | 50 | |
hive.join.cache.size | 25000 | |
hive.join.emit.interval | 1000 | |
hive.lazysimple.extended_boolean_literal | false | |
hive.limit.optimize.enable | false | |
hive.limit.optimize.fetch.max | 50000 | |
hive.limit.optimize.limit.file | 10 | |
hive.limit.pushdown.memory.usage | -1.0 | |
hive.limit.query.max.table.partition | -1 | |
hive.limit.row.max.size | 100000 | |
hive.localize.resource.num.wait.attempts | 5 | |
hive.localize.resource.wait.interval | 5000 | |
hive.lock.manager | org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager | |
hive.mapred.partitioner | org.apache.hadoop.hive.ql.io.DefaultHivePartitioner | |
hive.mapred.reduce.tasks.speculative.execution | true | |
hive.mapred.supports.subdirectories | false | |
hive.metastore.uris | thrift://hh:9083 | |
hive.metastore.warehouse.dir | /user/hive/warehouse | |
hive.multi.insert.move.tasks.share.dependencies | false | |
hive.multigroupby.singlereducer | true | |
hive.zookeeper.clean.extra.nodes | false | 在会话结束的时候是否清楚额外的节点数据 |
hive.zookeeper.client.port | 2181 | 客户端端口号 |
hive.zookeeper.quorum | zk的服务器端ip | |
hive.zookeeper.session.timeout | 600000 | zk的client端会话过期时间 |
hive.zookeeper.namespace | hive_zookeeper_namespace | |
javax.jdo.PersistenceManagerFactoryClass | org.datanucleus.api.jdo.JDOPersistenceManagerFactory | |
javax.jdo.option.ConnectionDriverName | 改为:com.mysql.jdbc.Driver | |
javax.jdo.option.ConnectionPassword | 改为:hive | |
javax.jdo.option.ConnectionURL | xxx | |
javax.jdo.option.ConnectionUserName | xxx | |
javax.jdo.option.DetachAllOnCommit | true | |
javax.jdo.option.Multithreaded | true | |
javax.jdo.option.NonTransactionalRead | true |