ClickHouse-005建库建表以及数据导入导出测试

一、创建数据库

语法:

CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)]
CREATE DATABASE testdb;  //创建数据库
DROP DATABASE testdb;     //删除数据库
  • 1.
  • 2.
  • 3.

二、建表

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
    ...
) ENGINE = engine
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
CREATE TABLE test_table( 
    province        String, 
    province_name         String, 
    create_date           date 
) ENGINE = MergeTree(create_date, (province), 8192);
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

如果直接这样执行会报错,有两种方法解决:
1: 在每一行后面加右斜杠,比如:

[root@tidb06 ~]# clickhouse-client -udefault  --password=j780UJy9D2tn
CREATE TABLE test_table01( \
    province        String, \
    province_name         String, \
    create_date           date \
) ENGINE = MergeTree(create_date, (province), 8192);
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.

2: 在登录的时候加 -m参数支持多行模式,比如:

clickhouse-client -m
[root@tidb06 ~]# clickhouse-client -udefault -m --password=j780UJy9D2tn

CREATE TABLE test_table( 
    province        String, 
    province_name         String, 
    create_date           date 
) ENGINE = MergeTree(create_date, (province), 8192);
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.

对建表sql的格式要求说明:
ENGINE:是表的引擎类型,最常用的MergeTree。还有一个Log引擎也是比较常用。MergeTree要求有一个日期字段,还有主键。Log没有这个限制。
create_date:是表的日期字段,一个表必须要有一个日期字段。
province:是表的主键,主键可以有多个字段,每个字段用逗号分隔
8192:是索引粒度,用默认值8192即可。

三、导入数据

3.1:普通的CSV文件导入

cat > test_table.csv << EOF
WA,WA_NAME,2020-08-25
CA,CA_NAME,2020-09-25
OR,OR_NAME,2020-10-25
EOF
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

–-导数:

clickhouse-client --query "INSERT INTO testdb.test_table FORMAT CSV" < test_table.csv;
  • 1.

–或者用管道的方式:

cat test_table.csv | clickhouse-client --query “INSERT INTO testdb.test_table FORMAT CSV”
  • 1.

–测试演示:

[root@tidb06 ~]# clickhouse-client  -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table FORMAT CSV" < test_table.csv;
[root@tidb06 ~]# 
[root@tidb06 ~]# cat test_table.csv | clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table01 FORMAT CSV"
[root@tidb06 ~]# clickhouse-client  -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table limit 2";
WA	WA_NAME	2020-08-25
CA	CA_NAME	2020-09-25
[root@tidb06 ~]# 
[root@tidb06 ~]# clickhouse-client  -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table01 limit 2";
WA	WA_NAME	2020-08-25
CA	CA_NAME	2020-09-25
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.

3.2:特殊的CSV文件导入(包含回车换行,转义符等)

说明:下载clickhouse官方提供的测试log引擎的表数据进行测试
这是Yandex.Metrica 日志收集的表分为点击hits和访问表vsits,可以作为日志埋点的重要参考数据
数据文件下载连接:

curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
  • 1.
  • 2.

关于这2个表的数据的恢复的官方介绍地址:
 https://clickhouse.tech/docs/zh/getting-started/example-datasets/metrica/

–建表:

CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;
[root@tidb06 ~]# clickhouse-client  -udefault --password=j780UJy9D2tn --query "CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;"
  • 1.
  • 2.

–导入测试数据:

[root@tidb06 ~]# cat test_table3.csv 
1,123.456,”abc 123”,” abc" "'123”,2020-08-26,2020-08-26 17:08:09
[root@tidb06 ~]# cat test_table3.csv| clickhouse-client -udefault --password=j780UJy9D2tn  --query "INSERT INTO testdb01.test_table02 FORMAT CSV"
[root@tidb06 ~]# 
[root@tidb06 ~]# clickhouse-client  -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table02 limit 2";
1	123.456	”abc 123”	” abc" "\'123”	2020-08-26	2020-08-26 17:08:09
[root@tidb06 ~]# 
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.

导入官方提供的测试数据:

[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn  --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
real    0m11.354s
user    0m17.004s
sys     0m0.890s

[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn  --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv 
real    0m33.984s
user    0m54.570s
sys     0m2.363s
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

数据导入完后,优化一下表

[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q  "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
real    0m20.816s
user    0m0.017s
sys     0m0.016s
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "OPTIMIZE TABLE tutorial.visits_v1 FINAL" 

real    0m18.757s
user    0m0.017s
sys     0m0.016s
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

优化完之后,查看下表数据量

[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.hits_v1"
8873898
real    0m0.189s
user    0m0.018s
sys     0m0.015s
[root@tidb06 data1]# 
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.visits_v1"
1676861
real    0m0.032s
user    0m0.017s
sys     0m0.014s

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.

4.导出数据:

[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="select * from  tutorial.visits_v1" > /data1/backup/tutorial.visits_v1.tsv 
real	0m13.066s
user	0m11.303s
sys	0m1.652s
[root@tidb06 backup]# du -sh tutorial.visits_v1.tsv 
2.5G	tutorial.visits_v1.tsv
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.

4.1备份开启压缩:

[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="select * from  tutorial.hits_v1" |gzip >/data1/backup/tutorial.hits_v1.tsv.gz 
real	3m49.556s
user	4m1.825s
sys	0m9.013s
  • 1.
  • 2.
  • 3.
  • 4.

[root@tidb06 backup]# du -sh tutorial.hits_v1.tsv.gz
1.3G tutorial.hits_v1.tsv.gz

4.2CTAS表快照:

[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query=" create table tutorial.hits_v2 as tutorial.hits_v1" 
real	0m0.088s
user	0m0.017s
sys	0m0.016s
[root@tidb06 ~]# 
[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="insert into table tutorial.hits_v2 select * from tutorial.hits_v1"
real	0m26.225s
user	0m0.024s
sys	0m0.025s
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.

4.3远程拷贝表数据到本地库:

授权tutorail 库给用户wujianwei,允许从172.16.0.246服务器来远程访问tidb06上clickhouse的tutorial库
具体权限配置文件如下:


[root@tidb06 users.d]# cat /etc/clickhouse-server/users.d/wujianwei_rw.xml 
<yandex>
    <users>
        <wujianwei>
            <!--password_sha256_hex>737d7dfac3176d59cddacbba9bc8be3fe406d78769bc613a023092a37bc849e6</password_sha256_hex> -->       <password_double_sha1_hex>c0952f7212b0161d07c6f45f00fdb73e17430f11</password_double_sha1_hex>
            <networks incl="networks" replace="replace">
                <!--ip>::/0</ip> -->
                <ip>172.16.0.246</ip>
            </networks>
            <profile>normal_2</profile>
            <quota>default</quota>
            <allow_databases>
                <database>test008</database>
                <database>tutorial</database>
            </allow_databases>
            <access_management>1</access_management>
       </wujianwei>
    </users>
</yandex>
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
[root@tidb05 ~]# clickhouse-client --user=wujianwei -h 172.16.0.247 --password=j780UJy9D2tn --port=9000 --query "show databases;"
test008
tutorial
  • 1.
  • 2.
  • 3.

创建同样的表结构表:


[root@tidb05 ~]# clickhouse-client -udefault -m
tidb05 :) create database tutoria;

CREATE TABLE tutorial.hits_v1 (     `WatchID` UInt64,     `JavaEnable` UInt8,     `Title` String,     `GoodEvent` Int16,     `EventTime` DateTime,     `EventDate` Date,     `CounterID` UInt32,     `ClientIP` UInt32,     `ClientIP6` FixedString(16),     `RegionID` UInt32,     `UserID` UInt64,     `CounterClass` Int8,     `OS` UInt8,     `UserAgent` UInt8,     `URL` String,     `Referer` String,     `URLDomain` String,     `RefererDomain` String,     `Refresh` UInt8,     `IsRobot` UInt8,     `RefererCategories` Array(UInt16),     `URLCategories` Array(UInt16),     `URLRegions` Array(UInt32),     `RefererRegions` Array(UInt32),     `ResolutionWidth` UInt16,     `ResolutionHeight` UInt16,     `ResolutionDepth` UInt8,     `FlashMajor` UInt8,     `FlashMinor` UInt8,     `FlashMinor2` String,     `NetMajor` UInt8,     `NetMinor` UInt8,     `UserAgentMajor` UInt16,     `UserAgentMinor` FixedString(2),     `CookieEnable` UInt8,     `JavascriptEnable` UInt8,     `IsMobile` UInt8,     `MobilePhone` UInt8,     `MobilePhoneModel` String,     `Params` String,     `IPNetworkID` UInt32,     `TraficSourceID` Int8,     `SearchEngineID` UInt16,     `SearchPhrase` String,     `AdvEngineID` UInt8,     `IsArtifical` UInt8,     `WindowClientWidth` UInt16,     `WindowClientHeight` UInt16,     `ClientTimeZone` Int16,     `ClientEventTime` DateTime,     `SilverlightVersion1` UInt8,     `SilverlightVersion2` UInt8,     `SilverlightVersion3` UInt32,     `SilverlightVersion4` UInt16,     `PageCharset` String,     `CodeVersion` UInt32,     `IsLink` UInt8,     `IsDownload` UInt8,     `IsNotBounce` UInt8,     `FUniqID` UInt64,     `HID` UInt32,     `IsOldCounter` UInt8,     `IsEvent` UInt8,     `IsParameter` UInt8,     `DontCountHits` UInt8,     `WithHash` UInt8,     `HitColor` FixedString(1),     `UTCEventTime` DateTime,     `Age` UInt8,     `Sex` UInt8,     `Income` UInt8,     `Interests` UInt16,     `Robotness` UInt8,     `GeneralInterests` Array(UInt16),     `RemoteIP` UInt32,     `RemoteIP6` FixedString(16),     `WindowName` Int32,     `OpenerName` Int32,     `HistoryLength` Int16,     `BrowserLanguage` FixedString(2),     `BrowserCountry` FixedString(2),     `SocialNetwork` String,     `SocialAction` String,     `HTTPError` UInt16,     `SendTiming` Int32,     `DNSTiming` Int32,     `ConnectTiming` Int32,     `ResponseStartTiming` Int32,     `ResponseEndTiming` Int32,     `FetchTiming` Int32,     `RedirectTiming` Int32,     `DOMInteractiveTiming` Int32,     `DOMContentLoadedTiming` Int32,     `DOMCompleteTiming` Int32,     `LoadEventStartTiming` Int32,     `LoadEventEndTiming` Int32,     `NSToDOMContentLoadedTiming` Int32,     `FirstPaintTiming` Int32,     `RedirectCount` Int8,     `SocialSourceNetworkID` UInt8,     `SocialSourcePage` String,     `ParamPrice` Int64,     `ParamOrderID` String,     `ParamCurrency` FixedString(3),     `ParamCurrencyID` UInt16,     `GoalsReached` Array(UInt32),     `OpenstatServiceName` String,     `OpenstatCampaignID` String,     `OpenstatAdID` String,     `OpenstatSourceID` String,     `UTMSource` String,     `UTMMedium` String,     `UTMCampaign` String,     `UTMContent` String,     `UTMTerm` String,     `FromTag` String,     `HasGCLID` UInt8,     `RefererHash` UInt64,     `URLHash` UInt64,     `CLID` UInt32,     `YCLID` UInt64,     `ShareService` String,     `ShareURL` String,     `ShareTitle` String,     `ParsedParams` Nested(         Key1 String,         Key2 String,         Key3 String,         Key4 String,         Key5 String,         ValueDouble Float64),     `IslandID` FixedString(16),     `RequestNum` UInt32,     `RequestTry` UInt8 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192;
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

远程copy数据:

tidb05 :) insert into table  tutorial.hits_v1  select * from remote ('172.16.0.247','tutorial.hits_v1','wujianwei','j780UJy9D2tn');
INSERT INTO tutorial.hits_v1 SELECT *
FROM remote('172.16.0.247', 'tutorial.hits_v1', 'wujianwei', 'j780UJy9D2tn')
Ok.

0 rows in set. Elapsed: 27.291 sec. Processed 8.87 million rows, 8.46 GB (325.15 thousand rows/s., 310.01 MB/s.) 

tidb05 :) select count(*) from tutorial.hits_v1;
SELECT count(*)
FROM tutorial.hits_v1

┌─count()─┐
│ 8873898 │
└─────────┘
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.

本次演示到此结束,欢迎一起交流和学习。

posted @   勤奋的蓝猫  阅读(22)  评论(0编辑  收藏  举报  
相关博文:
阅读排行:
· DeepSeek “源神”启动!「GitHub 热点速览」
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· 我与微信审核的“相爱相杀”看个人小程序副业
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· DeepSeek R1 简明指南:架构、训练、本地部署及硬件要求
点击右上角即可分享
微信分享提示