ClickHouse之初步认识
最近在Percona的blog上看到一篇文章:Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark,从中可以看到Clickhouse的性能完爆MariaDB ColumnStore和 Spark。于是对Clickhouse产生了浓厚的兴趣,所以也打算进行学习。目前Clickhouse还没有中文文档,学习起来还是有点费劲。Percona的另一篇博客介绍Clickhouse的也可以看看。ClickHouse: New Open Source Columnar Database,其中这里也有一篇文章也可以看看:彪悍开源的分析数据库-ClickHouse
那么ClickHouse到底是啥?
1. 开源的列存储数据库管理系统
2. 支持线性扩展
3. 简单方便
4. 高可靠性
5. 容错(支持多主机异步复制,可以跨多个数据中心部署。 单个节点或整个数据中心的停机时间不会影响系统的读写可用性)
ClickHouse关键功能和应用场景:
更加详细的可以参考官方文档。
目前ClickHouse对ubuntu系统支持比较友好,对于centos就差一点。Ubuntu有deb包可以直接安装。对于centos的则需要自己编译。本人在折腾很久都没有编译出来,最后放弃了。无意中看到了google邮件组中有人提到RPM包,有人搞了yum源,可以直接安装,这才解放。对于想自己编译的同学,可以参考:https://github.com/yandex/ClickHouse/blob/master/doc/build.md,下面进行yum安装
1. 添加yum源
yum-config-manager --add-repo http://repo.red-soft.biz/repos/clickhouse/repo/clickhouse-el6.repo
2. 安装:
yum install clickhouse-server clickhouse-client clickhouse-server-common clickhouse-compressor
3. 添加用户clickhouse
useradd clickhouse
4. 启动clickhouse
/etc/init.d/clickhouse-server start
5. 登录测试:
[root@db_server_yayun_01 ~]# clickhouse-client ClickHouse client version 1.1.54198. Connecting to localhost:9000. Connected to ClickHouse server version 1.1.54198. :) select 1 SELECT 1 ┌─1─┐ │ 1 │ └───┘ → Progress: 1.00 rows, 1.00 B (64.77 rows/s., 64.77 B/s.) 1 rows in set. Elapsed: 0.016 sec. :) select now(); SELECT now() ┌───────────────now()─┐ │ 2017-03-31 15:14:18 │ └─────────────────────┘ ↘ Progress: 1.00 rows, 1.00 B (216.22 rows/s., 216.22 B/s.) 1 rows in set. Elapsed: 0.005 sec. :)
启动失败可以查看日志,日志的目录默认为
/var/log/clickhouse-server
[root@db_server_yayun_01 clickhouse-server]# ll total 16 -rw-rw-rw-. 1 clickhouse clickhouse 383 Mar 31 13:33 clickhouse-server.err.log -rw-rw-rw-. 1 clickhouse clickhouse 7733 Mar 31 15:14 clickhouse-server.log -rw-rw-rw-. 1 clickhouse clickhouse 138 Mar 31 13:33 stderr -rw-rw-rw-. 1 clickhouse clickhouse 0 Mar 31 13:33 stdout [root@db_server_yayun_01 clickhouse-server]#
下面说说clickhouse-client的简单使用:
交互模式
clickhouse-client
clickhouse-client --host=... --port=... --user=... --password=...
启用多行查询:
clickhouse-client -m
clickhouse-client --multiline
对于建表的时候就需要启用多行查询,否则会报错,比如建如下表:
CREATE TABLE `ontime` ( `Year` UInt16, `Quarter` UInt8, `Month` UInt8, `DayofMonth` UInt8, `DayOfWeek` UInt8, `FlightDate` Date, `UniqueCarrier` FixedString(7), `AirlineID` Int32, `Carrier` FixedString(2), `TailNum` String, `FlightNum` String, `OriginAirportID` Int32, `OriginAirportSeqID` Int32, `OriginCityMarketID` Int32, `Origin` FixedString(5), `OriginCityName` String, `OriginState` FixedString(2), `OriginStateFips` String, `OriginStateName` String, `OriginWac` Int32, `DestAirportID` Int32, `DestAirportSeqID` Int32, `DestCityMarketID` Int32, `Dest` FixedString(5), `DestCityName` String, `DestState` FixedString(2), `DestStateFips` String, `DestStateName` String, `DestWac` Int32, `CRSDepTime` Int32, `DepTime` Int32, `DepDelay` Int32, `DepDelayMinutes` Int32, `DepDel15` Int32, `DepartureDelayGroups` String, `DepTimeBlk` String, `TaxiOut` Int32, `WheelsOff` Int32, `WheelsOn` Int32, `TaxiIn` Int32, `CRSArrTime` Int32, `ArrTime` Int32, `ArrDelay` Int32, `ArrDelayMinutes` Int32, `ArrDel15` Int32, `ArrivalDelayGroups` Int32, `ArrTimeBlk` String, `Cancelled` UInt8, `CancellationCode` FixedString(1), `Diverted` UInt8, `CRSElapsedTime` Int32, `ActualElapsedTime` Int32, `AirTime` Int32, `Flights` Int32, `Distance` Int32, `DistanceGroup` UInt8, `CarrierDelay` Int32, `WeatherDelay` Int32, `NASDelay` Int32, `SecurityDelay` Int32, `LateAircraftDelay` Int32, `FirstDepTime` String, `TotalAddGTime` String, `LongestAddGTime` String, `DivAirportLandings` String, `DivReachedDest` String, `DivActualElapsedTime` String, `DivArrDelay` String, `DivDistance` String, `Div1Airport` String, `Div1AirportID` Int32, `Div1AirportSeqID` Int32, `Div1WheelsOn` String, `Div1TotalGTime` String, `Div1LongestGTime` String, `Div1WheelsOff` String, `Div1TailNum` String, `Div2Airport` String, `Div2AirportID` Int32, `Div2AirportSeqID` Int32, `Div2WheelsOn` String, `Div2TotalGTime` String, `Div2LongestGTime` String, `Div2WheelsOff` String, `Div2TailNum` String, `Div3Airport` String, `Div3AirportID` Int32, `Div3AirportSeqID` Int32, `Div3WheelsOn` String, `Div3TotalGTime` String, `Div3LongestGTime` String, `Div3WheelsOff` String, `Div3TailNum` String, `Div4Airport` String, `Div4AirportID` Int32, `Div4AirportSeqID` Int32, `Div4WheelsOn` String, `Div4TotalGTime` String, `Div4LongestGTime` String, `Div4WheelsOff` String, `Div4TailNum` String, `Div5Airport` String, `Div5AirportID` Int32, `Div5AirportSeqID` Int32, `Div5WheelsOn` String, `Div5TotalGTime` String, `Div5LongestGTime` String, `Div5WheelsOff` String, `Div5TailNum` String ) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)
以批处理方式运行查询:
clickhouse-client --query='SELECT 1' echo 'SELECT 1' | clickhouse-client
从指定格式的文件插入数据:
clickhouse-client --query='INSERT INTO table VALUES' < data.txt clickhouse-client --query='INSERT INTO table FORMAT TabSeparated' < data.tsv
参考资料:
https://github.com/redsoftbiz/clickhouse-rpm