DataHub开源元数据管理工具搭建及使用

一、DataHub安装

  1、安装docker和docker-compose

    yum -y install docker

    curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

    chmod +x /usr/local/bin/docker-compose

    查看是否安装成功:

    docker --version

    docker-compose --version

  2、安装jq

    yum install epel-release

    yum -y install jq

  3、安装python3

    yum install python-pip gcc gcc-c++ python-virtualenv cyrus-sasl-devel

    yum -y groupinstall "Development tools"

    yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel

    wget https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz

    tar -zxvf Python-3.7.3.tgz

    mkdir /usr/local/python3

    cd Python-3.7.3

    ./configure --prefix=/usr/local/python3

    make && make install

    修改系统python环境:

    rm -rf /usr/bin/python

    ln -s /usr/local/python3/bin/python3 /usr/bin/python

    修改pip环境:

    rm -rf /usr/bin/pip

    ln -s /usr/local/python3/bin/pip3 /usr/bin/pip

    将python环境改为python3后需要改下yum的文件,默认使用的python2:

    vi /usr/bin/yum =>  把 #! /usr/bin/python 修改为 #! /usr/bin/python2

    vi /usr/libexec/urlgrabber-ext-down  => 把 #! /usr/bin/python 修改为 #! /usr/bin/python2

    升级pip:

    python -m pip install --upgrade pip wheel setuptools

  4、安装和启动DataHub

    python -m pip uninstall datahub acryl-datahub || true

    python -m pip install --upgrade acryl-datahub

    python -m datahub version

    python -m datahub docker quickstart

    

 

 

 

二、实践

  1、导入mysql元数据信息(这里重新用docker创建一个mysql容器)

  docker run -p 13306:3306 --name ownmysql -v /opt/docker_data/mysql/conf:/etc/mysql/conf.d -v /opt/docker_data/mysql/logs:/logs -v   /opt/docker_data/mysql/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql

  安装mysql插件:

  pip install 'acryl-datahub[mysql]'

  检查已经安装的插件:

  python -m datahub check plugins

  

  2、编写yam文件,通过rest接口读取mysql的元数据信息

source:
  type: mysql
  config:
    host_port: node:13306
    username: root
    password: 123456
    database: aucc

sink:
  type: "datahub-rest"
  config:
    server: "http://node:8080"

 

  3、摄取

  python -m datahub ingest -c mysql_to_datahub_rest.yml

 

  4、hive元数据信息摄取

  安装前置:

  yum install cyrus-sasl-plain  cyrus-sasl-devel  cyrus-sasl-gssapi

  pip install 'acryl-datahub[hive]'

source:
  type: hive
  config:
    host_port: node:10000
    username:
    password:
    database: default

sink:
  type: "datahub-rest"
  config:
    server: "http://node:8080"

  python -m datahub ingest -c hive_to_datahub_rest.yml

  

  5、界面

  

 

   

posted @ 2022-02-15 11:55  Shydow  阅读(4536)  评论(0编辑  收藏  举报