Apache Atlas 2.1.0编译部署手册

环境准备

组件版本

组件 部署版本 源码版本
os CentOS 7.6.1810 --
java 1.8.0_252 --
zookeeper 3.4.14 3.4.6
kafka 2.11-2.0.0 2.11-2.0.0
hadoop 3.1.1 3.1.1
hbase 2.0.2 2.0.2
solr 7.5.0 7.5.0
hive 3.1.0 3.1.0
atlas 2.1.0 2.1.0

角色分配

组件 n1
192.168.222.11
n2
192.168.222.12
n3
192.168.222.13
JDK
zookeeper
kafka
NameNode -- --
SecondaryNameNode -- --
MR JobHistory Server -- --
DataNode
ResourceManager -- --
NodeManager
hbase √(Master)
solr
hive -- --
MySQL -- --
atlas -- --

配置域名解析

在各节点 /etc/hosts 文件中新增如下内容

192.168.222.11 n1

192.168.222.12 n2

192.168.222.13 n3


配置Maven

修改 conf/settings.xml 配置文件如下内容

<!-- 修改Maven包存放路径 -->
<localRepository>/home/atlas/maven_packages</localRepository>

<!-- 修改镜像 -->
<mirror>
    <id>mirrorId</id>
    <mirrorOf>repositoryId</mirrorOf>
    <name>Human Readable Name for this Mirror.</name>
    <url>http://my.repository.com/repo/path</url>
</mirror>
    -->
<mirror>
    <id>alimaven</id>
    <name>aliyun maven</name>
    <url>https://maven.aliyun.com/repository/public</url>
    <mirrorOf>central</mirrorOf>
</mirror>
<mirror>
    <id>Central</id>
    <mirrorOf>central</mirrorOf>
    <name>Central Maven</name>
    <url>https://repo1.maven.org/maven2</url>
</mirror>

环境变量

export MAVEN_OPTS="-Xms4g -Xmx4g"
export MAVEN_HOME=/home/atlas/maven-3.6.3
export PATH=$MAVEN_HOME/bin:$PATH

配置SSH免密

  1. 在各节点执行ssh-keygen -t rsa,输入三次回车完成配置
  2. 将n2、n3节点的/root/.ssh/id_rsa.pub复制到n1节点,并重命名成对应的节点名称
    scp n2:/root/.ssh/id_rsa.pub /root/n2
    scp n3:/root/.ssh/id_rsa.pub /root/n3
    
  3. 在n1节点上,将所有节点的 id_rsa.pub 内容写入至n1节点的 /root/.ssh/authorized_keys 文件中
    cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
    cat /root/n2 >> /root/.ssh/authorized_keys
    cat /root/n3 >> /root/.ssh/authorized_keys
    
  4. 在n1节点使用ssh登陆各节点(包含本机),填充 known_hosts 文件
  5. 将n1节点上的 authorized_keys 和 known_hosts 复制到其余各节点的 /root/.ssh/ 目录中
    scp /root/.ssh/authorized_keys n2:/root/.ssh
    scp /root/.ssh/authorized_keys n3:/root/.ssh
    scp /root/.ssh/known_hosts n2:/root/.ssh
    scp /root/.ssh/known_hosts n3:/root/.ssh
    
    在每个节点测试免密码登陆是否生效

配置时间同步

  1. 是执行 rpm -qa | grep chrony 检查是否已经安装chrony;若没有,执行 yum -y install chrony 安装
  2. vim /etc/chrony.conf 修改如下
    chrony.conf
  3. 同步各节点的 chrony.conf 配置
  4. 启动chrony服务并设为开机启动
    systemctl enable chronyd.service
    systemctl start chronyd.service
    systemctl status chronyd.service
    
  5. 检查是否已经同步:timedatectl(NTP synchronized)
    timedatectl

Java环境变量

export JAVA_HOME=/home/atlas/jdk8
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOME/.local/bin:$HOME/bin
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPAT

配置本地yum源

  1. 在n1节点创建 /etc/yum.repos.d/base.repo 文件,并增加如下内容源到repo文件中
    [Local_ISO]
    name=Loacal ISO
    baseurl=file:///mnt
    gpgcheck=0
    enabled=1
    
  2. 在n1节点执行mount /dev/sr0 /mnt挂载系统光盘到/mnt目录
  3. 上传 createrepo-0.9.9-28.el7.noarch.rpm 文件到n1节点的 /root/files/ 中,并执行yum -y localinstall /root/files/createrepo-0.9.9-28.el7.noarch.rpm,所需要的两个依赖包可以在系统光盘中找到
  4. 在c1节点创建/root/rpms路径,将需要的rpm包都上传到该路径下
  5. 向 /etc/yum.repos.d/base.repo 文件添加如下内容
    [Local_RPM]
    name=Loacal RPM
    baseurl=http://cm:10040/rpms
    gpgcheck=0
    enabled=1
    
  6. 在n1节点 /root 目录中执行python -m SimpleHTTPServer 10040


编译打包Atlas

编译Atlas

mvn clean -DskipTests install -e

npm-6.13.7.tgz无法下载

自行下载 npm-6.13.7.tgz 后放入 /home/atlas/maven_packages/com/github/eirslett/npm/6.13.7/ 目录,并重命名为 npm-6.13.7.tar.gz

提示信息:Downloading http://registry.npmjs.org/npm/-/npm-6.13.7.tgz to /home/atlas/maven_packages/com/github/eirslett/npm/6.13.7/npm-6.13.7.tar.gz


node-sass无法安装

在用户home目录下创建.npmrc,在该文件内写入国内镜像源

registry=https://registry.npm.taobao.org/
sass_binary_site=https://npm.taobao.org/mirrors/node-sass
chromedriver_cdnurl=https://npm.taobao.org/mirrors/chromedriver
phantomjs_cdnurl=https://npm.taobao.org/mirrors/phantomjs
electron_mirror=https://npm.taobao.org/mirrors/electron

更多原因参见这里


打包Atlas

# 不使用内置hbase和solr
mvn clean -DskipTests package -Pdist
# 使用内置hbase和solr
mvn clean -DskipTests package -Pdist,embedded-hbase-solr

打包完成后产生如下文件
打包生成的文件


上传编译好的文件

上传 apache-atlas-2.1.0-server.tar.gz 文件

tar -zxf apache-atlas-2.1.0-server.tar.gz
mv apache-atlas-2.1.0/ atlas-2.1.0/
cd atlas-2.1.0/


安装必要组件

安装Zookeeper-3.4.14

  1. 上传 zookeeper-3.4.14.tar.gz 并解压缩
  2. 创建 zookeeper-3.4.14/zkData 目录
  3. 在 zookeeper-3.4.14/zkData 目录中创建myid文件
  4. 将 zookeeper-3.4.14 目录分发到各节点
  5. 修改各节点 zookeeper-3.4.14/zkData/myid 的整数值,即节点编号,各节点唯一。
  6. 进入 zookeeper-3.4.14/conf 目录,将zoo_sample.cfg重命名为zoo.cfg
  7. 修改 zoo.cfg 的如下参数
    dataDir=/root/zookeeper-3.4.14/zkData
    server.1=n1:2888:3888
    server.2=n2:2888:3888
    server.3=n3:2888:3888
    
    server.A=B:C:D
    • A: 数字,表示第几号服务器。集群模式下配置一个文件myid,该文件在dataDir目录下,这个文件里面有一个数据就是A的值。Zookeeper启动时读取此文件,拿到里面的数据与zoo.cfg里面的配置信息比较从而判断到底是哪个server。
    • B: 是这个服务器的IP地址或域名
    • C: 是这个服务器与集群中的Leader服务器交换信息的端口
    • D: 执行选举时服务器间通信端口
  8. 同步各节点的 zookeeper-3.4.14/conf/zoo.cfg 配置
  9. 启停与状态查看,在各节点执行
    • 启动:zookeeper-3.4.14/bin/zkServer.sh start
    • 停止:zookeeper-3.4.14/bin/zkServer.sh stop
    • 状态:zookeeper-3.4.14/bin/zkServer.sh status

安装kafka_2.11-2.0.0

  1. .bash_profile中增加如下变量
    export KAFKA_HOME=/root/kafka_2.11-2.0.0
    export PATH=$PATH:${KAFKA_HOME}/bin
    
  2. 创建 kafka_2.11-2.0.0/kfData 目录,用于存放kafka数据
  3. 打开config/server.properties,主要修改参数如下所示
    broker.id=1
    delete.topic.enable=true
    listeners=PLAINTEXT://:9092
    log.dirs=/root/kafka_2.11-2.0.0/kfData
    zookeeper.connect=n1:2181,n2:2181,n3:2181
    
    • broker.id:每个broker配置唯一的整数值
    • advertised.listeners:若只在内部使用kafka,则配置listeners即可。若需要内外网分开控制,则配置该参数
    • delete.topic.enable:允许删除topic
    • log.dirs:kafka数据存放目录
  4. 将 config/server.properties 文件分发到各 broker,并修改 broker.id 的数值
  5. 在各节点执行 ./bin/kafka-server-start.sh -daemon ./config/ server.properties 启动kafka。

安装hadoop-3.1.1

  1. 配置系统环境变量
    .bash_profile中配置如下内容

    export HADOOP_HOME=/root/hadoop-3.1.1
    export PATH=$PATH:${HADOOP_HOME}/bin
    
  2. 核心配置文件
    在 hadoop-3.1.1/etc/hadoop/core-site.xml 文件中修改如下配置

    <!-- 指定HDFS中NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://n1:9000</value>
    </property>
    <!-- 指定Hadoop运行时产生文件的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/hadoop-3.1.1/data/tmp</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
    
  3. HDFS配置文件
    在 hadoop-3.1.1/etc/hadoop/hadoop-evn.sh 修改如下配置

    export JAVA_HOME=/root/jdk8
    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root
    

    在 hadoop-3.1.1/etc/hadoop/hdfs-site.xml 修改如下配置

    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- 指定Hadoop SecondaryNameNode -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>n3:50090</value>
    </property>
    <!-- NameNode本地存放namespace和transaction日志路径 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/root/hadoop-3.1.1/data/namenode</value>
    </property>
    <!-- 32MB -->
    <property>
        <name>dfs.blocksize</name>
        <value>33554432</value>
    </property>
    <!-- DataNode本地存放路径 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/root/hadoop-3.1.1/data/datanode</value>
    </property>
    
  4. YARN配置文件
    在 hadoop-3.1.1/etc/hadoop/yarn-site.xml 修改如下配置

    <!-- Reducer获取数据的方式 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- 指定Yarn的ResourceManager地址 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>n2</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    
  5. MapReduce配置文件
    在 hadoop-3.1.1/etc/hadoop/mapred-site.xml 修改如下配置

    <!-- 指定MR运行在Yarn上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value>
    </property>
    <!--jobhistory地址-->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>shucang-26:10020</value>
        <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    <!--通过浏览器访问jobhistory的地址-->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>shucang-26:19888</value>
        <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>
    <!--MapReduce作业运行完之后放到哪里-->
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/job/history/done</value>
    </property>
    <!--正在运行中的放到哪-->
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/job/history/done_intermediate</value>
    </property>
    <!--每个Job Counter的数量-->
    <property>
        <name>mapreduce.job.counters.limit</name>
        <value>500</value>
    </property>
    <!--每个Map任务内存上限-->
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>2048</value>
    </property>
    <!--每个Job Counter的数量,建议为mapreduce.map.memory.mb的80%-->
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1638m</value>
    </property>
    <!--每个Reduce任务内存上限-->
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>2048</value>
    </property>
    <!--每个Job Counter的数量,建议为mapreduce.reduce.memory.mb的80%-->
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx1638m</value>
    </property>
    
  6. workers配置文件
    在hadoop-3.0.0/etc/hadoop/workers添加数据节点。该文件中添加的内容结尾不允许有空格,文件中不允许有空行。

    n1
    n2
    n3
    
  7. 将Hadoop分发到各节点

  8. 首次启动进群需执行格式化

    hadoop-3.1.1/bin/hdfs namenode -format
    
  9. 在 n1 上执行/root/hadoop-3.1.1/sbin/start-dfs.sh启动HDFS

  10. 在 n2 上执行/root/hadoop-3.1.1/sbin/start-yarn.sh启动Yarn

  11. 在 n3 上执行/root/hadoop-3.1.1/bin/mapred --daemon start historyserver启动MR Job History Server

  12. 执行如下命令测试HDFS和MapReduce

    hadoop fs -mkdir -p /tmp/input
    hadoop fs -put $HADOOP_HOME/README.txt /tmp/input
    export hadoop_version=`hadoop version | head -n 1 | awk '{print $2}'`
    hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-$hadoop_version.jar wordcount /tmp/input /tmp/output
    

安装hbase-2.0.2

  1. 配置系统变量
    .bash_profile中配置下面的环境变量
    export HBASE_HOME=/root/hbase-2.0.2
    export PATH=$PATH:${HBASE_HOME}/bin
    
  2. hbase-env.sh 修改内容
    export JAVA_HOME=/root/jdk8
    # Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
    # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
    # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
    export HBASE_MANAGES_ZK=false
    
  3. hbase-site.xml 修改内容
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://n1:9000/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <!-- 0.98后的新变动,之前版本没有.port,默认端口是60000 -->
    <!-- 16000是默认值不配也可以,WEBUI端口是16010 -->
    <property>
        <name>hbase.master.port</name>
        <value>16000</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>n1,n2,n3</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/root/zookeeper-3.4.14/zkData</value>
    </property>
    <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
    </property>
    
  4. regionservers 修改内容
    n1
    n2
    n3
    
  5. 将hbase分发到各节点
  6. 在各节点,软连接hadoop配置文件到hbase
    ln -s /root/hadoop-3.1.1/etc/hadoop/core-site.xml /root/hbase-2.0.2/conf/core-site.xml
    ln -s /root/hadoop-3.1.1/etc/hadoop/hdfs-site.xml /root/hbase-2.0.2/conf/hdfs-site.xml
    

安装Solr-7.5.0

  1. 执行 tar -zxf solr-7.5.0.tgz
  2. 进入solr目录,修改 bin/solr.in.sh 如下参数
    ZK_HOST="n1:2181,n2:2181,n3:2181"
    # 不同的节点配置不同的SOLR_HOST
    SOLR_HOST="n1"
    
  3. 将 /opt/solr 目录分发到其他节点,并修改SOLR_HOST的值
  4. 在各节点 /etc/security/limits.conf 文件中,添加如下内容,重启后生效
    root    hard    nofile  65000
    root    soft    nofile  65000
    root    hard    nproc   65000
    root    soft    nproc   65000
    
  5. 在个节点执行 bin/solr start启动solr
    /opt/solr/bin/solr start
    

MySQL-5.7.30

  1. 执行 rpm -qa | grep mariadb 检查是否安装了 mariadb。若存在则执行 rpm -e --nodeps xxx 进行删除
  2. 将 mysql-5.7.26-1.el7.x86_64.rpm-bundle.tar 上传至n1节点的 /root/rmps 目录中,并解压
  3. 执行createrepo -d /root/rpms/ && yum clean all
  4. 执行yum -y install mysql-community-server mysql-community-client
  5. 修改 /etc/my.cnf
    [mysqld]
    # Remove leading # and set to the amount of RAM for the most important data
    # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
    # innodb_buffer_pool_size = 128M
    #
    # Remove leading # to turn on a very important data integrity option: logging
    # changes to the binary log between backups.
    log_bin=/var/lib/mysql/mysql_binary_log
    #
    # Remove leading # to set options mainly useful for reporting servers.
    # The server defaults are faster for transactions and fast SELECTs.
    # Adjust sizes as needed, experiment to find the optimal values.
    # join_buffer_size = 128M
    # sort_buffer_size = 2M
    # read_rnd_buffer_size = 2M
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    transaction-isolation = READ-COMMITTED
    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
    
    #In later versions of MySQL, if you enable the binary log and do not set
    ##a server_id, MySQL will not start. The server_id must be unique within
    ##the replicating group.
    server_id=1
    
    key_buffer_size = 32M
    max_allowed_packet = 32M
    thread_stack = 256K
    thread_cache_size = 64
    query_cache_limit = 8M
    query_cache_size = 64M
    query_cache_type = 1
    
    max_connections = 250
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    character-set-server=utf8
    
    binlog_format = mixed
    read_buffer_size = 2M
    read_rnd_buffer_size = 16M
    sort_buffer_size = 8M
    join_buffer_size = 8M
    
    # InnoDB settings
    innodb_file_per_table = 1
    innodb_flush_log_at_trx_commit  = 2
    innodb_log_buffer_size = 64M
    innodb_buffer_pool_size = 4G
    innodb_thread_concurrency = 8
    innodb_flush_method = O_DIRECT
    innodb_log_file_size = 512M
    
    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    sql_mode=STRICT_ALL_TABLES
    
    [client]
    default-character-set=utf8
    
  6. 设置 MySQL 开机启动
    systemctl enable mysqld.service
    systemctl start mysqld.service
    systemctl status mysqld.service
    
  7. 执行grep password /var/log/mysqld.log获得初始密码
  8. 执行mysql_secure_installation对MySQL做基础配置
    Securing the MySQL server deployment.
    
    Enter password for user root: 输入初始密码
    
    The existing password for the user account root has expired. Please set a new password.
    
    New password: 输入新密码Root123!
    
    Re-enter new password: Root123!
    The 'validate_password' plugin is installed on the server.
    The subsequent steps will run with the existing configuration
    of the plugin.
    Using existing password for root.
    
    Estimated strength of the password: 100
    Change the password for root ? ((Press y|Y for Yes, any other key for No) : n
    Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y
    By default, a MySQL installation has an anonymous user,
    allowing anyone to log into MySQL without having to have
    a user account created for them. This is intended only for
    testing, and to make the installation go a bit smoother.
    You should remove them before moving into a production
    environment.
    
    Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
    Success.
    
    
    Normally, root should only be allowed to connect from
    'localhost'. This ensures that someone cannot guess at
    the root password from the network.
    
    Disallow root login remotely? (Press y|Y for Yes, any other key for No) : y
    Success.
    
    By default, MySQL comes with a database named 'test' that
    anyone can access. This is also intended only for testing,
    and should be removed before moving into a production
    environment.
    
    
    Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
    - Dropping test database...
    Success.
    
    - Removing privileges on test database...
    Success.
    
    Reloading the privilege tables will ensure that all changes
    made so far will take effect immediately.
    
    Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
    Success.
    
    All done!
    
    登录MySQL,执行show variables like "%char%"; 检查字符集是否为utf8

安装Hive-3.1.0

  1. 配置系统变量
    .bash_profile中配置下面的环境变量

    export HIVE_HOME=/root/apache-hive-3.1.0-bin
    export PATH=$PATH:${HIVE_HOME}/bin
    
  2. 配置Hive环境变量
    在 apache-hive-3.1.0-bin/conf/hive-env.sh 文件中修改如下内容

    HADOOP_HOME=${HADOOP_HOME}
    export HADOOP_HEAPSIZE=2048
    export HIVE_CONF_DIR=${HIVE_HOME}/conf
    
  3. 在MySQL创建库及用户

    CREATE DATABASE hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
    GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'Hive123!';
    flush privileges;
    
  4. 将 mysql-connector-java-5.1.47-bin.jar 拷贝至 apache-hive-3.1.0-bin/lib/ 目录中

  5. 在 apache-hive-3.1.0-bin/conf/hive-site.xml 文件中修改如下内容

    <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/tmpdir</value>
    </property>
    <property>
        <name>system:user.name</name>
        <value>hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://n1:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
        <description>
            JDBC connect string for a JDBC metastore.
            To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
            For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
        </description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>Username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>Hive123!</value>
        <description>password to use against metastore database</description>
    </property>
    <property>
        <name>hive.server2.authentication</name>
        <value>NONE</value>
        <description>
        Expects one of [nosasl, none, ldap, kerberos, pam, custom].
        Client authentication types.
            NONE: no authentication check
            LDAP: LDAP/AD based authentication
            KERBEROS: Kerberos/GSSAPI authentication
            CUSTOM: Custom authentication provider
                    (Use with property hive.server2.custom.authentication.class)
            PAM: Pluggable authentication module
            NOSASL:  Raw transport
        </description>
    </property>
    <!--这里配置的用户要求对inode="/tmp/hive" 有执行权限-->
    <property>
        <name>hive.server2.thrift.client.user</name>
        <value>root</value>
        <description>Username to use against thrift client</description>
    </property>
    <property>
        <name>hive.server2.thrift.client.password</name>
        <value>Root23!</value>
        <description>Password to use against thrift client</description>
    </property>
    <property>
        <name>hive.metastore.db.type</name>
        <value>mysql</value>
        <description>
            Expects one of [derby, oracle, mysql, mssql, postgres].
            Type of database used by the metastore. Information schema &amp; JDBCStorageHandler depend on it.
        </description>
    </property>
    
  6. 执行 schematool -initSchema -dbType mysql 初始化MySQL

  7. 在MySQL的Hive库中执行如下语句,避免Hive表、列、分区、索引等的中文注释乱码问题

    alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
    alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8;
    alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
    alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
    alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
    
  8. 执行 mkdir -p hive-3.1.0/logs

  9. 执行 cp hive-log4j2.properties.template hive-log4j2.properties ,并修改如下属性

    property.hive.log.dir = /root/hive-3.1.0/logs
    
  10. 执行nohup hiveserver2 1>/dev/null 2>&1 & echo $! > /app/hive-3.1.0/logs/hiveserver2.pid启动Hiveserver2

  11. 执行beeline -u jdbc:hive2://shucang-24:10000/default -n root -p Root123!启动Beeline



配置Atlas

Atlas配置Solr

  1. 在 atlas-application.properties 中修改如下配置
    atlas.graph.index.search.backend=solr
    atlas.graph.index.search.solr.mode=cloud
    # ZK quorum setup for solr as comma separated value.
    atlas.graph.index.search.solr.zookeeper-url=n1:2181,n2:2181,n3:2181
    atlas.graph.index.search.solr.wait-searcher=true
    
  2. 将 atlas 的conf/solr目录复制到各 solr server 节点的/root/solr-7.5.0目录下,名重命名为atlas_solr/
  3. 在 solr server 节点,创建collection
    ./solr create -c vertex_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force
    ./solr create -c edge_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force
    ./solr create -c fulltext_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force
    
  4. 如需删除 collection,请使用下面的语句,贴入浏览器地址栏即可
    http://n1:8983/solr/admin/collections?action=DELETE&name=vertex_index
    http://n1:8983/solr/admin/collections?action=DELETE&name=edge_index
    http://n1:8983/solr/admin/collections?action=DELETE&name=fulltext_index
    

Atlas配置Hbase

  1. 在 atlas-2.1.0/conf/atlas-application.properties 中修改如下配置

    atlas.graph.storage.backend=hbase2
    atlas.graph.storage.hbase.table=atlas
    atlas.graph.storage.hostname=n1:2181,n2:2181,n3:2181
    
  2. 在 atlas-env.sh 中修改如下配置

    export HBASE_CONF_DIR=/root/hbase-2.0.2/conf
    
  3. 将hbase配置文件复制到Atlas的 conf/hbase中

    cp /root/hbase-2.0.2/conf/* /root /atlas-2.1.0/conf/hbase/
    
  4. 删除 core-site.xml 和 hdfs-site.xml 文件,重新生成软连接

    ln -s /root/hadoop-3.1.1/etc/hadoop/core-site.xml /root/atlas-2.1.0/conf/hbase/core-site.xml
    ln -s /root/hadoop-3.1.1/etc/hadoop/hdfs-site.xml /root/atlas-2.1.0/conf/hbase/hdfs-site.xml
    

Altas配置Kafka

  1. 在 atlas-application.properties 中修改如下配置

    atlas.notification.embedded=false
    atlas.kafka.data=/root/atlas-2.1.0/data/kafka
    atlas.kafka.zookeeper.connect=n1:2181,n2:2181,n3:2181
    atlas.kafka.bootstrap.servers=n1:9092,n2:9092,n3:9092
    atlas.kafka.zookeeper.session.timeout.ms=4000
    atlas.kafka.zookeeper.connection.timeout.ms=2000
    atlas.kafka.enable.auto.commit=true
    
  2. 创建topic

    kafka-topics.sh --zookeeper n1:2181,n2:2181,n3:2181 --create --topic ATLAS_HOOK --partitions 1 --replication-factor 3
    kafka-topics.sh --zookeeper n1:2181,n2:2181,n3:2181 --create --topic ATLAS_ENTITIES --partitions 1 --replication-factor 3
    

    topic的名称可在 atlas-2.1.0/bin/atlas_config.py 中的 get_topics_to_create 方法找到,kafka设置脚本为 atlas-2.1.0/bin/atlas_kafka_setup.py


配置LDAP

  1. 在 atlas-application.properties 中增加/修改如下配置

    atlas.authentication.method.ldap=true
    atlas.authentication.method.ldap.type=ldap
    atlas.authentication.method.ldap.url=ldap://xx.xx.xx.xx:389
    atlas.authentication.method.ldap.userDNpattern=uid={0},ou=employee,dc=xx,dc=xxxx,dc=com
    atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=employee,dc=xx,dc=xxxx,dc=com)
    atlas.authentication.method.ldap.groupRoleAttribute=cn
    atlas.authentication.method.ldap.base.dn=dc=xx,dc=xxxx,dc=com
    atlas.authentication.method.ldap.bind.dn=ou=employee,dc=xx,dc=xxxx,dc=com
    
  2. LDAP配置解释,参见这里


Atlas其他配置

  1. 在 atlas-application.properties 中修改如下配置

    atlas.rest.address=http://n1:21000
    atlas.server.run.setup.on.start=false
    atlas.audit.hbase.tablename=apache_atlas_entity_audit
    atlas.audit.hbase.zookeeper.quorum=n1:2181,n2:2181,n3:2181
    
  2. 将 atlas-log4j.xml 中如下内容取消注释

    <appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
        <param name="file" value="${atlas.log.dir}/atlas_perf.log" />
        <param name="datePattern" value="'.'yyyy-MM-dd" />
        <param name="append" value="true" />
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d|%t|%m%n" />
        </layout>
    </appender>
    
    <logger name="org.apache.atlas.perf" additivity="false">
        <level value="debug" />
        <appender-ref ref="perf_appender" />
    </logger>
    


启动Atlas

  1. 按如下顺序启动各组件
    顺序 节点 组件
    1 n1 zookeeper
    2 n1 kafka
    3 n1 hdfs
    4 n2 yarn
    5 n3 jobhistoryserver
    6 n3 hbase
    7 n1 solr
    8 n1 msyql
    9 n1 hive
    10 n1 atlas
  2. 执行 bin/atlas_start.py
    atlas_start.py
  3. 浏览器访问http://n1:21000


配置Hive Hook

  1. 在 hive-site.xml 中修改如下配置项

    <property>
        <name>hive.exec.post.hooks</name>
        <value>org.apache.atlas.hive.hook.HiveHook</value>
    </property>
    
  2. 解压 apache-atlas-2.1.0-hive-hook.tar.gz,并进入 apache-atlas-hive-hook-2.1.0 目录

  3. 将 apache-atlas-hive-hook-2.1.0/hook/hive 中的全部内容复制到 atlas-2.1.0/hook/hive 中

  4. 在 hive-env.sh 中修改如下内容

    export HIVE_AUX_JARS_PATH=/root/atlas-2.1.0/hook/hive
    
  5. 在 atlas-application.properties 增加如下配置

    atlas.hook.hive.synchronous=false
    atlas.hook.hive.numRetries=3
    atlas.hook.hive.queueSize=10000
    atlas.cluster.name=primary
    atlas.kafka.zookeeper.connect=n1:2181,n2:2181,n3:2181
    atlas.kafka.zookeeper.connection.timeout.ms=30000
    atlas.kafka.zookeeper.session.timeout.ms=60000
    atlas.kafka.zookeeper.sync.time.ms=20
    
  6. 将 atlas-application.properties 复制到hive的conf目录中

  7. 执行./hook-bin/import-hive.sh将hive元数据导入atlas,用户名密码为登录atlas的用户名和密码

    ./hook-bin/import-hive.sh -d hive_testdb
    ……
    Enter username for atlas :- admin
    Enter password for atlas :-
    ……
    Hive Meta Data imported successfully!!!
    
  8. 进入/刷新atlas页面,在左侧的search中可看见hive已经有相关数据
    atlas页面

  9. 选择 hive_db(1) 点击search,结果如下图所示
    hive_db

  10. 查看表血缘
    查看血缘

posted @ 2020-12-04 17:33  数据狐  阅读(719)  评论(0编辑  收藏  举报