03 2019 档案
摘要:用户提交任务到yarn时有可能遇到下面的错误: 1) Requested user anything is not whitelisted and has id 980,which is below the minimum allowed 1000 这是因为yarn中配置min.user.id=10
阅读全文
摘要:命令行 $ oozie help 1 导出环境变量 $ export OOZIE_URL=http://oozie_server:11000/oozie 否则都需要增加 -oozie 参数,比如 $ oozie jobs -oozie http://oozie_server:11000/oozie
阅读全文
摘要:kibana添加index pattern卡住,通过浏览器查看请求返回状态为403 Forbidden,返回消息为: {"message":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];: [cluster_bloc
阅读全文
摘要:Index Settings 重要索引配置 Index level settings can be set per-index. Settings may be: 1 static 静态索引配置 They can only be set at index creation time or on a
阅读全文
摘要:yarn常用rest api 1 metrics # curl http://localhost:8088/ws/v1/cluster/metrics The cluster metrics resource provides some overall metrics about the clust
阅读全文
摘要:Docker images have a tag named latest which doesn’t work as you expect.Latest is just a tag with a special name.“Latest” simply means “the last build/
阅读全文
摘要:1 准备analyzer 内置analyzer 参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html 中文分词 smartcn 参考:https://www.elastic
阅读全文
摘要:当hdfs空间不足时,除了删除临时数据或垃圾数据之外,还可以适当调整部分大目录的副本数量,多管齐下; 1 查看 $ hdfs dfs -ls /user/hive/warehouse/temp.db/test_ext_o-rwxr-xr-x 3 hadoop supergroup 44324200
阅读全文
摘要:当kudu有tserver下线或者迁移或者修改hostname之后,旧的tserver会一直以dead状态出现,并且tserver日志中会有大量的连接重试日志,一天的错误日志会有几个G, W0322 22:13:59.202749 16927 tablet_service.cc:290] Inval
阅读全文
摘要:logstash6.6.0-6.6.2版本使用jdbc input plugin时如果设置了jdbc_default_timezone,会报错: { 2012 rufus-scheduler intercepted an error: 2012 job: 2012 Rufus::Scheduler:
阅读全文
摘要:kudu中的flume sink代码路径: https://github.com/apache/kudu/tree/master/java/kudu-flume-sink kudu-flume-sink默认使用的producer是 org.apache.kudu.flume.sink.SimpleK
阅读全文
摘要:flume sink核心类结构 1 核心接口Sink org.apache.flume.Sink /** * <p>Requests the sink to attempt to consume data from attached channel</p> * <p><strong>Note</st
阅读全文
摘要:hive2.3.4 presto0.215 使用hive2.3.4的beeline连接presto报错 $ beeline -d com.facebook.presto.jdbc.PrestoDriver -u "jdbc:presto://localhost:8080/hive" Error: U
阅读全文
摘要:ruby2.6.2 官方:https://www.ruby-lang.org/en/ 一 简介 A dynamic, open source programming language with a focus on simplicity and productivity. It has an ele
阅读全文
摘要:一个logstash很容易通过http打断成两个logstash实现跨服务器或者跨平台间数据同步,比如原来的流程是 logstash: nginx log -> kafka 打断成两个是 logstash1: nginx log -> http out logstash2: http in ->ka
阅读全文
摘要:从nginx日志中进行url解析 /v1/test?param2=v2¶m3=v3&time=2019-03-18%2017%3A34%3A14->{'param1':'v1','param2':'v2','param3':'v3','time':'2019-03-18 17:34:14'}
阅读全文
摘要:gz文件不需要解压即可进行相关操作 $ zcat test.log.gz $ zmore test.log.gz $ zless test.log.gz $ zgrep '1.2.3.4' test.log.gz $ egrep 'regex' test.log.gz
阅读全文
摘要:Logstash 6.6.2 官方:https://www.elastic.co/products/logstash 一 简介 Centralize, Transform & Stash Your Data Logstash is an open source, server-side data p
阅读全文
摘要:有时需要修改很多jar(假设这些jar都位于lib目录)中其中一个jar中的某一个类,而且又没有原始代码或ide,这时最简单的方式是: 1 进入lib目录 # cd lib # ls test.jar dependency1.jar dependency2.jar 2 查看待修改jar包内类结构 $
阅读全文
摘要:应用一:kafka数据同步到kudu 1 准备kafka topic # bin/kafka-topics.sh --zookeeper $zk:2181/kafka -create --topic test_sync --partitions 2 --replication-factor 2 WA
阅读全文
摘要:云主机cpu使用率突然很高 查看服务器发现异常 1 crontab # crontab -l* * * * * /tmp/.dns/y2kupdate >/dev/null 2>&1 2 iptables # iptables -nLChain INPUT (policy DROP)target p
阅读全文
摘要:hadoop.security.authentication: Kerberos -> Simple hadoop.security.authorization: true -> false dfs.datanode.address: -> from 1004 (for Kerberos) to 5
阅读全文
摘要:hdfs开启kerberos之后,namenode报错,连不上journalnode 2019-03-15 18:54:46,504 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as
阅读全文
摘要:mysql启动失败,一直crash,报错如下: 2019-03-14T11:15:12.937923Z 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 1118110825, calculated checksums for
阅读全文
摘要:有时服务器环境受限,比如在内网环境不能暴露端口从外网访问,用curl看html代码比较累,这时可以使用命令行浏览器来查看相关页面 links 官方:http://links.twibright.com/ Links is an open source text and graphic web bro
阅读全文
摘要:presto 0.217 官方:http://prestodb.github.io/ 一 简介 Presto is an open source distributed SQL query engine for running interactive analytic queries against
阅读全文
摘要:应用一:mysql数据增量同步到kafka 1 准备mysql测试表 mysql> create table test_sync(id int not null auto_increment, name varchar(32), description varchar(64), create_tim
阅读全文
摘要:问题:spark中如果有两个DataFrame(或者DataSet),DataFrameA依赖DataFrameB,并且两个DataFrame都进行了cache,将DataFrameB unpersist之后,DataFrameA的cache也会失效,官方解释如下: When invalidatin
阅读全文
摘要:查看 1 vi $ vi $file:set fileencoding 2 file $ file $file 修改 1 vi $ vi $file:set fileencoding=utf-8 2 iconv # iconv -l # iconv -f gbk -t utf8 /path/to/f
阅读全文
摘要:1 compress & mr hive默认的execution engine是mr hive> set hive.execution.engine;hive.execution.engine=mr 所以针对mr的优化就是hive的优化,比如压缩和临时目录 mapred-site.xml <prop
阅读全文
摘要:1 测试集群 内存:256GCPU:32Core (Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz)Disk(系统盘):300GDisk(数据盘):1.5T*1 2 测试数据 tpcds parquet 10g tpcds orc 10g 3 测试对象 hive-
阅读全文
摘要:The general result is that Docker is nearly identical to Native performance and faster than KVM in every category. 1 CPU 2 Memory 3 Network Docker’s u
阅读全文
摘要:http_load-09Mar2016官方:https://acme.com/software/http_load/ 一 简介 http_load - multiprocessing http test client http_load runs multiple http fetches in p
阅读全文
摘要:使用docker部署 1 下载 # wget https://github.com/doujiang24/lua-resty-kafka/archive/v0.06.tar.gz# tar xvf v0.06.tar.gz 2 准备配置文件testkafka.conf # vi testkafka.
阅读全文
摘要:beeline连接hiveserver2报错 Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session: java.lang.Runti
阅读全文
摘要:openresty 1.15.8.1 官方:https://openresty.org/en/ 一 简介 OpenResty® is a dynamic web platform based on NGINX and LuaJIT. openresty是一个基于nginx和luajit的动态web平
阅读全文
摘要:tpc 官方:http://www.tpc.org/ 一 简介 The TPC is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminat
阅读全文
摘要:装机 装机之后执行 sudo zypper ar -fc https://mirrors.aliyun.com/opensuse/distribution/leap/15.0/repo/oss openSUSE-Aliyun-OSSsudo zypper ar -fc https://mirrors
阅读全文
摘要:1 下载iso opensuse 下载: http://download.opensuse.org/distribution/openSUSE-stable/iso/openSUSE-Leap-15.0-DVD-x86_64.iso 以c盘为例,将iso拷贝到c盘根目录,同时将iso中的两个文件放到
阅读全文
摘要:hive 2.3.4 on spark 2.4.0 Hive on Spark provides Hive with the ability to utilize Apache Spark as its execution engine. set hive.execution.engine=spar
阅读全文
摘要:CentOS6服务用chkconfig控制,CentOS7改为systemd控制 1 systemd systemd is a suite of basic building blocks for a Linux system. It provides a system and service ma
阅读全文
摘要:1 hive # kadmin.local -q 'ktadd -k /tmp/hive3.keytab -norandkey hive/server03@TEST.COM'# kinit -kt /tmp/hive3.keytab hive/server03@TEST.COM# klist # b
阅读全文
摘要:部署方式:docker+airflow+mysql+LocalExecutor 使用airflow的docker镜像 https://hub.docker.com/r/puckel/docker-airflow 使用默认的sqlite+SequentialExecutor启动: $ docker r
阅读全文