代码改变世界

clickhouse-local

2023-01-06 16:47  abce  阅读(285)  评论(0编辑  收藏  举报

clickhouse-local可以使你能够对本地文件执行快速处理,而无需部署和配置ClickHouse Server。可以理解成是ClickHouse服务的单机版微内核,是一个轻量级的应用程序

clickhouse-local是clickhouse-client的一部分,clickhouse-local使用与ClickHouse Server相同的核心,因此它支持大多数功能以及相同的格式和表引擎。

默认情况下clickhouse-local不能访问同一主机上的数据,但它支持使用--config-file方式加载服务器配置。不建议将生产服务器配置加载到clickhouse-local因为数据可能在人为错误的情况下被损坏。

对于临时数据,默认情况下会创建一个唯一的临时数据目录。

使用语法:

$ clickhouse-local --structure "table_structure" --input-format "format_of_incoming_data" --query "query"

参数说明:

-S, --structure — table structure for input data.
--input-format — input format, TSV by default.
-f, --file — path to data, stdin by default.
-q, --query — queries to execute with ; as delimeter. You must specify either query or queries-file option.
--queries-file - file path with queries to execute. You must specify either query or queries-file option.
-N, --table — table name where to put output data, table by default.
--format, --output-format — output format, TSV by default.
-d, --database — default database, _local by default.
--stacktrace — whether to dump debug output in case of exception.
--echo — print query before execution.
--verbose — more details on query execution.
--logger.console — Log to console.
--logger.log — Log file name.
--logger.level — Log level.
--ignore-error — do not stop processing if a query failed.
-c, --config-file — path to configuration file in same format as for ClickHouse server, by default the configuration empty.
--no-system-tables — do not attach system tables.
--help — arguments references for clickhouse-local.
-V, --version — print version information and exit.

  

示例1:

$ echo -e "1,2\n3,4" | clickhouse-local --structure "a Int64, b Int64" --input-format "CSV" --query "SELECT * FROM table"
Read 2 rows, 32.00 B in 0.000 sec., 5182 rows/sec., 80.97 KiB/sec.
1   2
3   4

  

示例2:

$ echo -e "1,2\n3,4" | clickhouse-local --query "
    CREATE TABLE table (a Int64, b Int64) ENGINE = File(CSV, stdin);
    SELECT a, b FROM table;
    DROP TABLE table"
Read 2 rows, 32.00 B in 0.000 sec., 4987 rows/sec., 77.93 KiB/sec.
1   2
3   4

  


不是必须加上参数stdin或--file,使用file表函数,可以打开任意数量的文件:

$ echo 1 | tee 1.tsv
1

$ echo 2 | tee 2.tsv
2

$ clickhouse-local --query "
    select * from file('1.tsv', TSV, 'a int') t1
    cross join file('2.tsv', TSV, 'b int') t2"
1   2

$ ps aux | tail -n +2 | awk '{ printf("%s\t%s\n", $1, $4) }' \
>     | clickhouse-local --structure "user String, mem Float64" \
>         --query "SELECT user, round(sum(mem), 2) as memTotal
>             FROM table GROUP BY user ORDER BY memTotal DESC FORMAT Pretty"
┏━━━━━━━━━━┳━━━━━━━━━━┓
┃ user     ┃ memTotal ┃
┡━━━━━━━━━━╇━━━━━━━━━━┩
│ clickho+ │     19.8 │
├──────────┼──────────┤
│ mysql    │      9.8 │
├──────────┼──────────┤
│ root     │      4.9 │
├──────────┼──────────┤
│ mongod   │      3.2 │
├──────────┼──────────┤
│ polkitd  │        3 │
├──────────┼──────────┤
│ postgres │      0.4 │
├──────────┼──────────┤
│ libstor+ │        0 │
├──────────┼──────────┤
│ dbus     │        0 │
├──────────┼──────────┤
│ ntp      │        0 │
└──────────┴──────────┘