Hive分布式数据仓库（大数据）

数据仓库是一个面向主题的（Subject Oriented），集成的（Integrated），相对稳定的（Non-Volatile）以及反映历史变化的（Time Variant）数据集合，用于支持管理决策。
面向主题：数据仓库会围绕一些主题来组织和构建。
集成：指构建数据仓库通常会将多个异构的数据源。
相对稳定：数据仓库大多会分开存储数据，数据仓库不需要进行事务处理，数据恢复和并发控制等。
反映历史变化:数据仓库是从历史的角度提供信息的。

数据仓库和数据库的区别

数据仓库的系统结构

Hive概述和体系结构

Hive简介

Hive应用场景

Hive体系结构

华为Hive架构

引入了WebHCat组件。
WebHCat对外提供REST接口，使用户可以通过超文本传输安全协议（Hyper Text Transfer Protocol Secure， HTTPS）使用元数据访问，数据定义语言（Data Defination Language，DDL）查询等服务。

Hive与传统数据仓库比较（1）

Hive与传统数据仓库比较（2）

Hive优点

Hive缺点

Hive数据存储模型

Hive分区和分桶

Hive基本操作

Hive数据基本操作（1）

Hive数据基本操作（2）

Hive SQL介绍

DDL操作（1）

DDL操作（2）

create [temporary] [external] table [if not exists] [db_name.] table_name

[(col_name data_type [comment col_comment] ,...)]

[comment table_comment] [row format row_format]

[stored as file_format]

like table_name1

[location hdfs_path]

describe [tablename]

DDL操作（3）

show tables

alter table [firest_table] rename to [second_table]

alter table table_name add|replace columns (col_name data_type [comment col_comment])

DML操作

load data [local] inpath 'filepath'

[overwrite] info table tablename

[paetition (partcol1=val1,partcol2=val2)]

export table tablename to '/department'

DQL操作（1）

select [all | distinct] select_expr,select_expr,....

from table_reference

[where where_condition]

[group by col_list[having condition]]

[cluster by col_list| [distribute by col_list]] [sort by| order by col_list]

[limit number]

DQL操作（2）

table_reference join table_factor [join_condition]

| table_reference {left|right|full} [outer] join table_reference join_condition

| table_reference left semi join table_reference join_condition

| table_reference cross join table_reference [join_condition] (as of Hive 0.10)

DQL操作（3）

>from smalltable

>join bigtable

>on smalltable.key = bigtable.key

Hive支持的函数

Hive数据压缩与文件存储格式

posted @ 2022-07-28 08:45 十一没有撤退可言！阅读(360) 评论(0) 编辑收藏举报

刷新页面返回顶部

十一没有撤退可言！