ClickHouse-005建库建表以及数据导入导出测试
一、创建数据库
语法:
二、建表
如果直接这样执行会报错,有两种方法解决:
1: 在每一行后面加右斜杠,比如:
2: 在登录的时候加 -m参数支持多行模式,比如:
对建表sql的格式要求说明:
ENGINE:是表的引擎类型,最常用的MergeTree。还有一个Log引擎也是比较常用。MergeTree要求有一个日期字段,还有主键。Log没有这个限制。
create_date:是表的日期字段,一个表必须要有一个日期字段。
province:是表的主键,主键可以有多个字段,每个字段用逗号分隔
8192:是索引粒度,用默认值8192即可。
三、导入数据
3.1:普通的CSV文件导入
–-导数:
–或者用管道的方式:
–测试演示:
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table FORMAT CSV" < test_table.csv;
[root@tidb06 ~]#
[root@tidb06 ~]# cat test_table.csv | clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table01 FORMAT CSV"
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table limit 2";
WA WA_NAME 2020-08-25
CA CA_NAME 2020-09-25
[root@tidb06 ~]#
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table01 limit 2";
WA WA_NAME 2020-08-25
CA CA_NAME 2020-09-25
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
3.2:特殊的CSV文件导入(包含回车换行,转义符等)
说明:下载clickhouse官方提供的测试log引擎的表数据进行测试
这是Yandex.Metrica 日志收集的表分为点击hits和访问表vsits,可以作为日志埋点的重要参考数据
数据文件下载连接:
关于这2个表的数据的恢复的官方介绍地址:
https://clickhouse.tech/docs/zh/getting-started/example-datasets/metrica/
–建表:
CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;"
- 1.
- 2.
–导入测试数据:
[root@tidb06 ~]# cat test_table3.csv
1,123.456,”abc 123”,” abc" "'123”,2020-08-26,2020-08-26 17:08:09
[root@tidb06 ~]# cat test_table3.csv| clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table02 FORMAT CSV"
[root@tidb06 ~]#
[root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table02 limit 2";
1 123.456 ”abc 123” ” abc" "\'123” 2020-08-26 2020-08-26 17:08:09
[root@tidb06 ~]#
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
导入官方提供的测试数据:
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
real 0m11.354s
user 0m17.004s
sys 0m0.890s
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
real 0m33.984s
user 0m54.570s
sys 0m2.363s
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
数据导入完后,优化一下表
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
real 0m20.816s
user 0m0.017s
sys 0m0.016s
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
real 0m18.757s
user 0m0.017s
sys 0m0.016s
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
优化完之后,查看下表数据量
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.hits_v1"
8873898
real 0m0.189s
user 0m0.018s
sys 0m0.015s
[root@tidb06 data1]#
[root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.visits_v1"
1676861
real 0m0.032s
user 0m0.017s
sys 0m0.014s
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
4.导出数据:
4.1备份开启压缩:
[root@tidb06 backup]# du -sh tutorial.hits_v1.tsv.gz
1.3G tutorial.hits_v1.tsv.gz
4.2CTAS表快照:
[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query=" create table tutorial.hits_v2 as tutorial.hits_v1"
real 0m0.088s
user 0m0.017s
sys 0m0.016s
[root@tidb06 ~]#
[root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="insert into table tutorial.hits_v2 select * from tutorial.hits_v1"
real 0m26.225s
user 0m0.024s
sys 0m0.025s
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
4.3远程拷贝表数据到本地库:
授权tutorail 库给用户wujianwei,允许从172.16.0.246服务器来远程访问tidb06上clickhouse的tutorial库
具体权限配置文件如下:
[root@tidb06 users.d]# cat /etc/clickhouse-server/users.d/wujianwei_rw.xml
<yandex>
<users>
<wujianwei>
<!--password_sha256_hex>737d7dfac3176d59cddacbba9bc8be3fe406d78769bc613a023092a37bc849e6</password_sha256_hex> --> <password_double_sha1_hex>c0952f7212b0161d07c6f45f00fdb73e17430f11</password_double_sha1_hex>
<networks incl="networks" replace="replace">
<!--ip>::/0</ip> -->
<ip>172.16.0.246</ip>
</networks>
<profile>normal_2</profile>
<quota>default</quota>
<allow_databases>
<database>test008</database>
<database>tutorial</database>
</allow_databases>
<access_management>1</access_management>
</wujianwei>
</users>
</yandex>
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
创建同样的表结构表:
[root@tidb05 ~]# clickhouse-client -udefault -m
tidb05 :) create database tutoria;
CREATE TABLE tutorial.hits_v1 ( `WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams` Nested( Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192;
- 1.
- 2.
- 3.
- 4.
- 5.
远程copy数据:
tidb05 :) insert into table tutorial.hits_v1 select * from remote ('172.16.0.247','tutorial.hits_v1','wujianwei','j780UJy9D2tn');
INSERT INTO tutorial.hits_v1 SELECT *
FROM remote('172.16.0.247', 'tutorial.hits_v1', 'wujianwei', 'j780UJy9D2tn')
Ok.
0 rows in set. Elapsed: 27.291 sec. Processed 8.87 million rows, 8.46 GB (325.15 thousand rows/s., 310.01 MB/s.)
tidb05 :) select count(*) from tutorial.hits_v1;
SELECT count(*)
FROM tutorial.hits_v1
┌─count()─┐
│ 8873898 │
└─────────┘
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
本次演示到此结束,欢迎一起交流和学习。
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek “源神”启动!「GitHub 热点速览」
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· 我与微信审核的“相爱相杀”看个人小程序副业
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· DeepSeek R1 简明指南:架构、训练、本地部署及硬件要求