The following table declaration creates an external table that can read all the data files
for this comma-delimited data in /data/stocks:
CREATE EXTERNAL TABLE IF NOT EXISTS stocks (
  exchange STRING,
  symbol STRING,
  ymd STRING,
  price_open FLOAT,
  price_high FLOAT,
  price_low FLOAT,
  price_close FLOAT,
  volume INT,
  price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/stocks';
下面开始导入数据到economy的stocks表中（Note that:可以directory或file形式导入）
hive (economy)> load data local inpath '/home/landen/下载/infochimps_dataset_4777_download_16185/NASDAQ/NASDAQ_daily_prices_character' overwrite into table stocks;

    It's conventional practice to specify a path that is a directory, rather than an indivitual file.
Hive will copy all the files in the directory, which give you the flexibility of organizing
the data into multiple files and changing the file naming convention,without requiring a change
to your Hive scripts. Either way, the files will be copied to the appropriate location for the table
and the names will be the same.
    If the local keyword is used, the path is assumed to be in the local filesystem. The data is copied 
into the final location. If local keyword is omitted, the path is assumed to be in the distributed filesystem.
    
    Notice: 
    @1. If you specify the overwrite keyword, any data already present in the target directory will be
deleted first. Without the keyword, the new files are simply added to the target directory. However, if files
already exist in the target directory that match filenames being loaded, the old files are overwritten.
    
    @2. Hive does not verify that the data you are loading matches the schema for the table. However,it 
will verify that the file format matches the table definition. For example, if the table was created with SEQUENCEFILE
storage, the loaded files must be sequence files.

hive (economy)> select count(*) from stocks where symbol = 'BBND';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201303271617_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201303271617_0002
Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201303271617_0002
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2013-03-27 18:54:23,829 Stage-1 map = 0%,  reduce = 0%
2013-03-27 18:54:33,043 Stage-1 map = 28%,  reduce = 0%
2013-03-27 18:54:36,236 Stage-1 map = 41%,  reduce = 0%
2013-03-27 18:54:39,244 Stage-1 map = 57%,  reduce = 0%
2013-03-27 18:54:42,252 Stage-1 map = 90%,  reduce = 0%
2013-03-27 18:54:45,264 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:46,268 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:47,273 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:48,278 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:49,283 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:50,287 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:51,291 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:52,295 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:53,299 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.08 sec
2013-03-27 18:54:54,304 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 16.08 sec
2013-03-27 18:54:55,308 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 16.08 sec
2013-03-27 18:54:56,313 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 16.08 sec
2013-03-27 18:54:57,318 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.1 sec
2013-03-27 18:54:58,323 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.1 sec
2013-03-27 18:54:59,329 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.1 sec
2013-03-27 18:55:00,334 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.1 sec
2013-03-27 18:55:01,339 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.1 sec
2013-03-27 18:55:02,344 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.1 sec
2013-03-27 18:55:03,350 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.1 sec
MapReduce Total cumulative CPU time: 19 seconds 100 msec
Ended Job = job_201303271617_0002
MapReduce Jobs Launched: 
Job 0: Map: 2  Reduce: 1   Cumulative CPU: 19.1 sec   HDFS Read: 481098497 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 19 seconds 100 msec
OK
731
Time taken: 49.812 seconds
hive (economy)>
posted on 2013-03-27 19:15 kalor 阅读(1049) 评论(0) 编辑收藏举报
刷新页面返回顶部
导航