Doris安装使用
1. 整体架构
2. 安装部署
2.1 资源规划
实例名称 | 端口名称 | 默认端口 | 通信方式 | 说明 |
---|---|---|---|---|
FE | http_port | 8030 -> 8034 | 用户/FE<--> FE | FE上的http端口 web管理界面 |
BE | webserver_port | 8040 -> 8041 | BE<-->FE | BE上的http端口 跟hadoop端口号冲突 |
FE | query_port | 9030 | 用户 <--> FE | FE上的mysql server端口 navicat连接端口 |
2.2 部署步骤
1. 方式一:bin包安装
https://doris.apache.org/zh-CN/download
1.下载apache-doris-2.0.0-bin-x64.tar.gz
2.解压
3.修改配置文件
修改ip、port
新增JAVA_HOME=/data/module/jdk1.8.0_212
4.新建数据目录
mkdir doris-meta
mkdir doris-storage1
mkdir doris-storage2
5.建库
#连接FE,默认无密码
mysql -h core1 -P 9030 -uroot
#修改密码:
SET PASSWORD FOR 'root' = PASSWORD('000000');
#添加用户:
create user 'test' identified by 'test';
#再次进入:
mysql -h master1 -P 9030 -uroot -p000000
CREATE USER 'test' IDENTIFIED BY 'test';
GRANT ALL ON wafbase TO test;
CREATE USER 'sase'@'%' IDENTIFIED BY 'Xnetworks.c0M';
GRANT SELECT_PRIV ON internal.wafbase.* TO 'test'@'%';
SHOW ROLES;
show all grants;
#查看sql语法:
HELP ALTER TABLE ;
#添加BE要用ip,不能用host
ALTER SYSTEM ADD BACKEND "172.31.12.14:9050";
ALTER SYSTEM ADD BACKEND "172.31.12.10:9050";
ALTER SYSTEM ADD BACKEND "172.31.12.5:9050";
SHOW PROC '/backends';
#添加FE
ALTER SYSTEM ADD FOLLOWER "172.31.12.14:9010";
ALTER SYSTEM ADD OBSERVER "172.31.12.10:9010";
SHOW PROC '/frontends';
#添加BROKER
ALTER SYSTEM ADD BROKER broker_name "172.31.12.5:8000","172.31.12.10:8000","172.31.12.14:8000";
SHOW PROC "/brokers";
6.启动
第一次启动加helper
fe/bin/start_fe.sh --helper 172.31.12.5:9010 --daemon
fe/bin/start_fe.sh --helper 172.31.12.5:9010 --daemon
be/bin/start_be.sh --daemon
fe/bin/start_fe.sh --daemon
7.web界面
http://core1:8034/login
2. 方式二:安装manager部署doris
地址:https://www.selectdb.com/download/enterprise#manager
QA: https://selectdb.feishu.cn/docx/AdaudiDKQovS7SxQwhucrUg7nPd
1.安装manager的用户(hadoop)要配好ssh免密
ssh: vim /etc/ssh/sshd_config(注意不同用户路径不同)
切换到hadoop用户生成公钥
2.权限
具有是否具有crontab权限:crontab -l
查看是否具有S权限位: ll /bin/crontab
chmod a+s /bin/crontab
chmod g-s /bin/crontab
vim /etc/security/access.conf
+ : hadoop : crontab : ALL
echo hadoop >> /etc/cron.allow
3.启动
1.下载Enterprise Manager:doris-manager-23.11.3-x64-bin.tar.gz
cd webserver/
bin/start.sh
4.界面部署doris集群
manager:http://172.31.12.14/:8004 root/@Xnetworks.c0M
sql:http://172.31.12.14:8010/cluster/query
apache-doris升级selectdb-doris才能使用WEBUI界面,manager支持一键升级。
1.下载Enterprise Core:selectdb-doris-2.0.4-b01-bin-x64.tar.gz
2.移到到/data/software/doris目录,让manager找到安装包即可直接she
3. 使用
3.1 数据模型
3.1.1 Aggregate
聚合需求
Key-维度列 排序列和唯一列
Value-指标列 AggregationType
SUM / MAX / MIN / REPLACE / REPLACE_IF_NOT_NULL / HLL_UNION / BITMAP_UNION
CREATE TABLE IF NOT EXISTS example_tbl_agg
(
`user_id` LARGEINT NOT NULL COMMENT "用户id",
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`timestamp` DATETIME NOT NULL COMMENT "数据灌入日期时间戳",
`city` VARCHAR(20) COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`sex` TINYINT COMMENT "用户性别",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
)
AGGREGATE KEY(`user_id`, `date`, `timestamp` ,`city`, `age`, `sex`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
局限性:
1.查询与聚合类型不一致的聚合类时,查询效率降低,比如cost是SUM,查询时用MIN(cost);
2.count(*)开销很大,增加一个值恒为1的SUM的列,select sum(count) from table。
3.1.2 Unique
UNIQUE KEY:根据主键去重,默认写时合并,排序列和唯一列
效果等同于Aggregate模式的REPLACE
CREATE TABLE IF NOT EXISTS example_tbl_unique
(
`user_id` LARGEINT NOT NULL COMMENT "用户id",
`username` VARCHAR(50) NOT NULL COMMENT "用户昵称",
`city` VARCHAR(20) COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`sex` TINYINT COMMENT "用户性别",
`phone` LARGEINT COMMENT "用户电话",
`address` VARCHAR(500) COMMENT "用户地址",
`register_time` DATETIME COMMENT "用户注册时间"
)
UNIQUE KEY(`user_id`, `username`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
局限性:unique是aggregate的特例
3.1.3 Duplicate
默认的模型,保留明细数据 DUPLICATE KEY:排序列
CREATE TABLE IF NOT EXISTS example_tbl_duplicate
(
`timestamp` DATETIME NOT NULL COMMENT "日志时间",
`type` INT NOT NULL COMMENT "日志类型",
`error_code` INT COMMENT "错误码",
`error_msg` VARCHAR(1024) COMMENT "错误详细信息",
`op_id` BIGINT COMMENT "负责人id",
`op_time` DATETIME COMMENT "处理时间"
)
DUPLICATE KEY(`timestamp`, `type`, `error_code`)
DISTRIBUTED BY HASH(`type`) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
3.2 数据划分
3.2.1 Partition & Bucket
表的数据量可以通过 SHOW DATA 命令查看,结果除以副本数,即表的数据量
Partition 分区:Range 和 List 50GB
Bucket(Tablet)分桶:Hash 和 Random 1G-10G
500MB 4-8个Bucket
5GB 8-16个Bucket
50GB 32个Bucket
500GB 10个Partition 每个分区32个Bucket
5TB 100个Partition 每个分区32个Bucket
3.2.2 动态分区
CREATE TABLE event_log (
ts DATETIME
)
DUPLICATE KEY(ts)
PARTITION BY RANGE(ts) ()
DISTRIBUTED BY HASH(ts)
PROPERTIES
(
"replication_num" = "1",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-30", --删除历史分区
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32",
"dynamic_partition.create_history_partition"="true",
"enable_unique_key_merge_on_write" = "true"
);
SHOW DYNAMIC PARTITION TABLES;
3.3 查询优化
3.3.1 Rollup
Aggregate 和 Uniq 模型中的 ROLLUP表
在Base表的基础上,获得更粗粒度的聚合数据,即将数据按某种指定的粒度进行进一步聚合。
CREATE TABLE IF NOT EXISTS example_site_visit2
(
`user_id` LARGEINT NOT NULL COMMENT "用户id",
`date` DATE NOT NULL COMMENT "数据灌入日期时间",
`timestamp` DATETIME COMMENT "数据灌入时间,精确到秒",
`city` VARCHAR(20) COMMENT "用户所在城市",
`age` SMALLINT COMMENT "用户年龄",
`sex` TINYINT COMMENT "用户性别",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间",
`cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
)
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
DISTRIBUTED BY HASH(user_id) BUCKETS 10 PROPERTIES("replication_num" = "1");
insert into example_site_visit2 values
(10000,'2017-10-01','2017-10-01 08:00:05','北京',20,0,'2017-10-01 06:00:00',20,10,10),
(10000,'2017-10-01','2017-10-01 09:00:05','北京',20,0,'2017-10-01 07:00:00',15,2,2),
(10001,'2017-10-01','2017-10-01 18:12:10','北京',30,1,'2017-10-01 17:05:45',2,22,22),
(10002,'2017-10-02','2017-10-02 13:10:00','上海',20,1,'2017-10-02 12:59:12',200,5,5),
(10003,'2017-10-02','2017-10-02 13:15:00','广州',32,0,'2017-10-02 11:20:00',30,11,11),
(10004,'2017-10-01','2017-10-01 12:12:48','深圳',35,0,'2017-10-01 10:00:15',100,3,3),
(10004,'2017-10-03','2017-10-03 12:38:20','深圳',35,0,'2017-10-03 10:20:22',11,6,6);
select * from example_site_visit2;
desc example_site_visit2 all;
alter table example_site_visit2 add rollup rollup_cost_userid(user_id,cost);
alter table example_site_visit2 add rollup rollup_city_age_cost_maxd_mind(city,age,cost,max_dwell_time,min_dwell_time);
explain SELECT sum(cost),user_id FROM example_site_visit2 GROUP BY user_id;
explain SELECT sum(cost),user_id,city FROM example_site_visit2 GROUP BY user_id,city;
explain SELECT city, age,max(max_dwell_time), min(min_dwell_time), sum(cost) FROM example_site_visit2 GROUP BY age,city;
SHOW ALTER TABLE ROLLUP;
Duplicate 模型中的 ROLLUP
与mysql索引不同,不支持在任意字段创建索引。doris用的是前缀索引(前36个字节),所以需要调整列顺序,以命中前缀索引
说明:
1.用户可以创建和删除ROLLUP表,但是否命中是Doris自动决定;
2.创建的ROLLUP越多,占用的磁盘空间越大,查询速度变快,导入速度会减低;
3.3.2 物化视图
视图-sql
物化视图-sql+结果
HELP CREATE MATERIALIZED VIEW
HELP ALTER TABLE;
create database test_db;
CREATE TABLE flinktest (
siteid INT DEFAULT '10',
citycode SMALLINT,
username VARCHAR(32) DEFAULT '',
pv BIGINT SUM DEFAULT '0'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid)
BUCKETS 10
PROPERTIES("replication_num" = "1");
insert into flinktest values
(1, 1, 'jim', 2),
(2, 1, 'grace', 2),
(3, 2, 'tom', 2),
(4, 3, 'bush', 3),
(5, 3, 'helen', 3);
SELECT * from flinktest;
show data;
SHOW ALTER TABLE COLUMN;
desc flinktest;
ALTER TABLE flinktest ADD COLUMN uv BIGINT SUM DEFAULT '0' after pv;
ALTER TABLE flinktest ADD COLUMN ct BIGINT SUM DEFAULT '0' after pv;
ALTER TABLE flinktest DROP COLUMN uv;
ALTER TABLE flinktest MODIFY COLUMN username VARCHAR(32);
ALTER TABLE event_log RENAME COLUMN ruleId rule_id ;
4. 数据导入
4.1 Stream load
客户端通过HTTP协议将本地文件同步推到doris
curl --location-trusted -u root:000000 -H "label:stream" -H "column_separator:," -T student.csv http://172.31.12.5:8034/api/test_db/hdfs_student/_stream_load
4.2 Broker load
异步拉取HDFS:/test目录下数据同步到doris中testdb.user
LOAD LABEL testdb.label_1(
DATA INFILE("hdfs://172.31.15.13:4007/test")
INTO TABLE user
COLUMNS TERMINATED BY " "
(name,age)
)
with HDFS
(
"hadoop.username" = "hadoop",
"fs.defaultFS"="hdfs://172.31.15.13:4007",
"dfs.nameservices" = "hafs",
"dfs.ha.namenodes.hafs" = "nn1,nn2",
"dfs.namenode.rpc-address.hafs.my_namenode1" = "172.31.15.13:4007",
"dfs.namenode.rpc-address.hafs.my_namenode2" = "172.31.15.12:4007",
"dfs.client.failover.proxy.provider.hafs" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
)
PROPERTIES
(
"timeout"="1200",
"max_filter_ratio"="0.1"
);
show load order by createtime desc ;
4.3 TVF
基于HDFS/S3 insert into同步导入
INSERT INTO user
select * from hdfs(
"uri" = "hdfs://172.31.15.13:4007/test",
"fs.defaultFS" = "hdfs://172.31.15.13:4007",
"hadoop.username" = "doris",
"format" = "csv",
"column_separator" = " ",
"dfs.nameservices" = "my_hdfs",
"dfs.ha.namenodes.my_hdfs" = "nn1,nn2",
"dfs.namenode.rpc-address.my_hdfs.nn1" = "172.31.15.13:4007",
"dfs.namenode.rpc-address.my_hdfs.nn2" = "172.31.15.12:4007",
"dfs.client.failover.proxy.provider.my_hdfs" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
4.4 Flink Doris Connector
flinkcdc将mysql数据同步到Doris
问题:只能同步表,数据暂时不能同步,待解决
flink run \
-Dexecution.checkpointing.interval=10s \
-Dparallelism.default=1 \
-c org.apache.doris.flink.tools.cdc.CdcTools \
flink-doris-connector-1.17-1.4.0.jar \
mysql-sync-database \
--database testdb \
--mysql-conf hostname=172.31.15.7 \
--mysql-conf port=3306 \
--mysql-conf username=root \
--mysql-conf password=Xnetworks@c0M \
--mysql-conf database-name=bigdata \
--sink-conf fenodes=172.31.15.12:8034 \
--sink-conf username=root \
--sink-conf password=xxx \
--sink-conf jdbc-url=jdbc:mysql://172.31.15.12:9030 \
--sink-conf sink.label-prefix=mysql_doris \
--table-conf replication_num=1;
Flink SQL读写Doris
-- doris表
-- doris source
CREATE TABLE flink_doris_source (
name STRING,
age STRING
)
WITH (
'connector' = 'doris',
'fenodes' = '172.31.15.12:8034',
'table.identifier' = 'wafbase.user', --doris中的表
'username' = 'root',
'password' = 'xxx',
'sink.label-prefix' = 'doris_label1'
);
-- doris sink
CREATE TABLE flink_doris_sink (
name STRING,
age STRING
)
WITH (
'connector' = 'doris',
'fenodes' = '172.31.15.12:8034',
'table.identifier' = 'wafbase.user', --doris中的表
'username' = 'root',
'password' = '@Xnetworks.c0M',
'sink.label-prefix' = 'doris_labe_user1', --insert语句中每次要换不同的label
'sink.enable-2pc'='false' --或者关闭2pc
);
insert into flink_doris_sink/*+ OPTIONS('sink.label-prefix'='a1')*/ values ('44','4') ;
INSERT INTO flink_doris_sink select * from flink_doris_source;
select * from flink_doris_sink ;
案例 kafka-doris
--1.doris建表
CREATE TABLE IF NOT EXISTS user_log (
id int,
ts bigint,
vc int
)
DUPLICATE KEY(name)
DISTRIBUTED BY HASH(`type`) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
--2.sql-client建数据源
CREATE TABLE flink_source (
id INT,
ts BIGINT,
vc INT
) WITH (
'connector' = 'datagen',
'rows-per-second'='1',
'fields.id.kind'='random',
'fields.id.min'='1',
'fields.id.max'='10',
'fields.ts.kind'='sequence',
'fields.ts.start'='1',
'fields.ts.end'='1000000',
'fields.vc.kind'='random',
'fields.vc.min'='1',
'fields.vc.max'='10'
);
--3.读取kafka数据源
CREATE TABLE kafka_log(
id int,
ts bigint ,
vc int )
WITH (
'connector' = 'kafka',
'properties.bootstrap.servers' = '172.31.15.21:9092',
'properties.group.id' = 'wsl',
'scan.startup.mode' = 'earliest-offset',
'sink.partitioner' = 'fixed',
'topic' = 'wsl',
'format' = 'json'
);
--4.造数据
insert into kafka_log(id,ts,vc) select * from flink_source;
--5.映射到doris表
CREATE TABLE kafka_flink_doris_sink (
id int,
ts bigint,
vc int
)
WITH (
'connector' = 'doris',
'fenodes' = '172.31.15.12:8034',
'table.identifier' = 'testdb.user_log',
'username' = 'root',
'password' = '@Xnetworks.c0M',
'sink.label-prefix' = 'doris_label'
);
--6.将kafka数据写入doris
insert into kafka_flink_doris_sink select * from kafka_log;
select * from kafka_flink_doris_sink;
CREATE TABLE access_log (
event_time DATETIME,
user_id BIGINT,
request_id VARCHAR(50),
client_ip VARCHAR(32),
url VARCHAR(128),
city VARCHAR(15),
is_attack INT,
request_method VARCHAR(32),
req_size INT,
resp_size INT,
request_time DOUBLE,
server_ip VARCHAR(32),
server_port INT,
host VARCHAR(32),
domain VARCHAR(32),
`status` INT,
INDEX idx_event_time (`event_time`) USING INVERTED COMMENT 'inverted index for event_time',
INDEX idx_url (`url`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for url',
INDEX idx_client_ip (`client_ip`) USING INVERTED COMMENT 'inverted index for client_ip',
INDEX idx_request_id (`request_id`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for request_id'
)
UNIQUE KEY(event_time,user_id,request_id)
PARTITION BY RANGE(event_time) ()
DISTRIBUTED BY HASH(event_time)
PROPERTIES
(
"replication_num" = "1",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-30",
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "4",
"dynamic_partition.create_history_partition"="true",
"enable_unique_key_merge_on_write" = "true"
);
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 在鹅厂做java开发是什么体验
· 百万级群聊的设计实践
· WPF到Web的无缝过渡:英雄联盟客户端的OpenSilver迁移实战
· 永远不要相信用户的输入:从 SQL 注入攻防看输入验证的重要性
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析