Clickhouse-client导入数据和查询

通过官方客户端导入数据

本部分将结合开源数据集,介绍通过官方客户端clickhouse-client导入数据并查询。

 

  1. 下载数据集。单击 下载链接下载开源ontime数据集。
  2. 在ClickHouse中建表:
    $ ./clickhouse-client --multiline --host=<host> --port=<port> --user=<user> --password=<password> 
    
    --建立本地表
    CREATE TABLE ontime_local ON CLUSTER default
    (
        Year UInt16,
        Quarter UInt8,
        Month UInt8,
        DayofMonth UInt8,
        DayOfWeek UInt8,
        FlightDate Date,
        UniqueCarrier FixedString(7),
        AirlineID Int32,
        Carrier FixedString(2),
        TailNum String,
        FlightNum String,
        OriginAirportID Int32,
        OriginAirportSeqID Int32,
        OriginCityMarketID Int32,
        Origin FixedString(5),
        OriginCityName String,
        OriginState FixedString(2),
        OriginStateFips String,
        OriginStateName String,
        OriginWac Int32,
        DestAirportID Int32,
        DestAirportSeqID Int32,
        DestCityMarketID Int32,
        Dest FixedString(5),
        DestCityName String,
        DestState FixedString(2),
        DestStateFips String,
        DestStateName String,
        DestWac Int32,
        CRSDepTime Int32,
        DepTime Int32,
        DepDelay Int32,
        DepDelayMinutes Int32,
        DepDel15 Int32,
        DepartureDelayGroups String,
        DepTimeBlk String,
        TaxiOut Int32,
        WheelsOff Int32,
        WheelsOn Int32,
        TaxiIn Int32,
        CRSArrTime Int32,
        ArrTime Int32,
        ArrDelay Int32,
        ArrDelayMinutes Int32,
        ArrDel15 Int32,
        ArrivalDelayGroups Int32,
        ArrTimeBlk String,
        Cancelled UInt8,
        CancellationCode FixedString(1),
        Diverted UInt8,
        CRSElapsedTime Int32,
        ActualElapsedTime Int32,
        AirTime Int32,
        Flights Int32,
        Distance Int32,
        DistanceGroup UInt8,
        CarrierDelay Int32,
        WeatherDelay Int32,
        NASDelay Int32,
        SecurityDelay Int32,
        LateAircraftDelay Int32,
        FirstDepTime String,
        TotalAddGTime String,
        LongestAddGTime String,
        DivAirportLandings String,
        DivReachedDest String,
        DivActualElapsedTime String,
        DivArrDelay String,
        DivDistance String,
        Div1Airport String,
        Div1AirportID Int32,
        Div1AirportSeqID Int32,
        Div1WheelsOn String,
        Div1TotalGTime String,
        Div1LongestGTime String,
        Div1WheelsOff String,
        Div1TailNum String,
        Div2Airport String,
        Div2AirportID Int32,
        Div2AirportSeqID Int32,
        Div2WheelsOn String,
        Div2TotalGTime String,
        Div2LongestGTime String,
        Div2WheelsOff String,
        Div2TailNum String,
        Div3Airport String,
        Div3AirportID Int32,
        Div3AirportSeqID Int32,
        Div3WheelsOn String,
        Div3TotalGTime String,
        Div3LongestGTime String,
        Div3WheelsOff String,
        Div3TailNum String,
        Div4Airport String,
        Div4AirportID Int32,
        Div4AirportSeqID Int32,
        Div4WheelsOn String,
        Div4TotalGTime String,
        Div4LongestGTime String,
        Div4WheelsOff String,
        Div4TailNum String,
        Div5Airport String,
        Div5AirportID Int32,
        Div5AirportSeqID Int32,
        Div5WheelsOn String,
        Div5TotalGTime String,
        Div5LongestGTime String,
        Div5WheelsOff String,
        Div5TailNum String
    )ENGINE = ReplicatedMergeTree(
        '/clickhouse/tables/ontime_local/{shard}',
        '{replica}')
     PARTITION BY toYYYYMM(FlightDate)
     PRIMARY KEY (intHash32(FlightDate))
     ORDER BY (intHash32(FlightDate),FlightNum)
     SAMPLE BY intHash32(FlightDate)
    SETTINGS index_granularity= 8192 ;
    
    --建立分布式表
    CREATE TABLE ontime_distributed ON CLUSTER default
     AS ontime_local 
    ENGINE = Distributed(default, default, ontime_local, rand());
    
    --退出
    exit;
  3. 解压下载的开源数据集,导入数据 :
    unzip ontime-data.zip;
    ./clickhouse-client --host=<host> --port=<port> --user=<user> --password=<password> --query="INSERT INTO ontime_distributed FORMAT CSVWithNames" < ontime-data.csv;
  4. 支持使用SQL语句进行数据查询。示例如下:
    $ ./clickhouse-client --multiline --host=<host> --port=<port> --user=<user> --password=<password> 
    
    :) SELECT
        OriginCityName,
        DestCityName,
        count(*) AS flights,
        bar(flights, 0, 20000, 40)
    FROM ontime_distributed 
    WHERE Year = 1988
    GROUP BY OriginCityName, DestCityName 
    ORDER BY flights DESC 
    LIMIT 20;
    
    :) SELECT
        OriginCityName < DestCityName ? OriginCityName : DestCityName AS a,
        OriginCityName < DestCityName ? DestCityName : OriginCityName AS b,
        count(*) AS flights,
        bar(flights, 0, 40000, 40)
    FROM ontime_distributed 
    WHERE Year = 1988 
    GROUP BY a, b 
    ORDER BY flights DESC 
    LIMIT 20;
    
    :) SELECT OriginCityName, count(*) AS flights
    FROM ontime_distributed 
    GROUP BY OriginCityName 
    ORDER BY flights DESC 
    LIMIT 20;




posted @ 2021-05-27 13:45  jason_wei  阅读(794)  评论(0编辑  收藏  举报