01 2015 档案
摘要:public static void main(String[] args) throws ParseException { String str = "20140301"; String str1 = "20140731"; SimpleDateForma...
阅读全文
摘要:根据mapjoin的计算原理,MAPJION会把小表全部读入内存中,在map阶段直接拿另外一个表的数据和内存中表数据做匹配。这种情况下即使笛卡尔积也不会对任务运行速度造成太大的效率影响。mapjoin的应用场景如下:1.有一个极小的表= a.min_dt) f left outer join h...
阅读全文
摘要:二叉堆是一种特殊的堆,二叉堆是完全二叉树或者是近似完全二叉树。二叉堆满足堆特性:父节点的键值总是保持固定的序关系于任何一个子节点的键值,且每个节点的左子树和右子树都是一个二叉堆。当父节点的键值总是大于或等于任何一个子节点的键值时为最大堆。 当父节点的键值总是小于或等于任何一个子节点的键值时为最小堆。...
阅读全文
摘要:快速排序(Quick Sort)使用分治法策略。它的基本思想是:选择一个基准数,通过一趟排序将要排序的数据分割成独立的两部分;其中一部分的所有数据都比另外一部分的所有数据都要小。然后,再按此方法对这两部分数据分别进行快速排序,整个排序过程可以递归进行,以此达到整个数据变成有序序列。快速排序流程:(1...
阅读全文
摘要:希尔排序(Shell Sort)是插入排序的一种,它是针对直接插入排序算法的改进。该方法又称缩小增量排序,因DL.Shell于1959年提出而得名。希尔排序实质上是一种分组插入方法。它的基本思想是:对于n个待排序的数列,取一个小于n的整数gap(gap被称为步长)将待排序元素分成若干个组子序列,所有...
阅读全文
摘要:插入排序(Insertion Sort)是一种简单直观的排序算法。它的工作原理是通过构建有序序列,对于未排序数据,在已排序序列中从后向前扫描,找到相应位置并插入。插入排序在实现上,通常采用in-place排序(即只需用到O(1)的额外空间的排序),因而在从后向前扫描过程中,需要反复把已排序元素逐步向...
阅读全文
摘要:选择排序(Selection sort)是一种简单直观的排序算法。它的工作原理如下。首先在未排序序列中找到最小(大)元素,存放到排序序列的起始位置,然后,再从剩余未排序元素中继续寻找最小(大)元素,然后放到已排序序列的末尾。以此类推,直到所有元素均排序完毕。简单的可以理解为:将整个数组视为虚拟的有序...
阅读全文
摘要:冒泡排序(Bubble Sort,台湾另外一种译名为:泡沫排序)是一种简单的排序算法。它重复地走访过要排序的数列,一次比较两个元素,如果他们的顺序错误就把他们交换过来。走访数列的工作是重复地进行直到没有再需要交换,也就是说该数列已经排序完成。这个算法的名字由来是因为越小的元素会经由交换慢慢“浮”到数...
阅读全文
摘要:Status of Hive Authorization before Hive 0.13SQL Standards Based HiveAuthorization (New in Hive 0.13)Restrictions on HiveCommands and StatementsPrivil...
阅读全文
摘要:Hive Concurrency ModelHive Concurrency ModelUse CasesTurn Off ConcurrencyDebuggingConfigurationLocking in Hive TransactionsUse CasesConcurrency suppor...
阅读全文
摘要:EXPLAIN SyntaxEXPLAIN SyntaxHive provides anEXPLAINcommand that shows the execution plan for a query. The syntax for this statement is as follows:EXPL...
阅读全文
摘要:LanguageManual WindowingAndAnalyticsSkip to end of metadataAdded byLefty Leverenz, last edited byLefty Leverenzon Aug 01, 2014(view change)show commen...
阅读全文
摘要:Virtual ColumnsSimple ExamplesVirtual ColumnsHive 0.8.0 provides support for two virtual columns:One isINPUT__FILE__NAME, which is the input file's na...
阅读全文
摘要:Sampling SyntaxSampling Bucketized TableBlock SamplingSampling Syntax 抽样语法Sampling Bucketized Tabletable_sample: TABLESAMPLE (BUCKET x OUT OF y [ON co...
阅读全文
摘要:Subqueries in the FROM ClauseSubqueries in the WHERE ClauseSubqueries in the FROM ClauseSELECT...FROM(subquery)name...SELECT...FROM(subquery)ASname......
阅读全文
摘要:Lateral View SyntaxDescriptionExampleMultiple Lateral ViewsOuter Lateral ViewsLateral View SyntaxlateralView: LATERAL VIEW udtf(expression) tableAlias...
阅读全文
摘要:Union Syntaxselect_statement UNION ALL select_statement UNION ALL select_statement ...UNION is used to combine the result from multiple SELECT stateme...
阅读全文
摘要:Join OptimizationJoin OptimizationImprovements to the Hive OptimizerStar Join OptimizationStar Schema ExamplePrior Support for MAPJOINLimitations of P...
阅读全文
摘要:Hive JoinsHive JoinsJoin SyntaxExamplesMapJoin RestrictionsJoin OptimizationPredicate Pushdown in Outer JoinsEnhancements in Hive Version 0.11Join Syn...
阅读全文
摘要:Documentation for Built-In User-Defined Functions Related To XPathUDFsxpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_numb...
阅读全文
摘要:Hive Operators and User-Defined Functions (UDFs)Hive Operators and User-Defined Functions (UDFs)Built-in OperatorsRelational OperatorsArithmetic Opera...
阅读全文
摘要:Transform/Map-Reduce SyntaxSQL Standard Based Authorization Disallows TRANSFORMTRANSFORM ExamplesSchema-less Map-reduce ScriptsTyping the output of TR...
阅读全文
摘要:Syntax of Order BySyntax of Sort ByDifference between Sort By and Order BySetting Types for Sort BySyntax of Cluster By and Distribute BySyntax of Ord...
阅读全文
摘要:Group By SyntaxSimple ExamplesSelect statement and group by clauseAdvanced FeaturesMulti-Group-By InsertsMap-side Aggregation for Group ByGrouping Set...
阅读全文
摘要:Select SyntaxWHERE ClauseALL and DISTINCT ClausesPartition Based QueriesHAVING ClauseLIMIT ClauseREGEX Column SpecificationMore Select SyntaxGROUP BYS...
阅读全文
摘要:LanguageManual ImportExportSkip to end of metadataAdded byCarl Steinbach, last edited byLefty Leverenzon May 14, 2013(view change)show commentGo to st...
阅读全文
摘要:LanguageManual DMLHive Data Manipulation LanguageHive Data Manipulation LanguageLoading files into tablesSyntaxSynopsisNotesInserting data into Hive T...
阅读全文
摘要:Archiving for File Count ReductionNote: Archiving should be considered an advanced command due to the caveats involved.Archiving for File Count Reduct...
阅读全文
摘要:Statistics in HiveStatistics in HiveMotivationScopeTable and Partition StatisticsColumn StatisticsTop K StatisticsImplementationUsageConfiguration Var...
阅读全文
摘要:DisclaimerPrerequisitesUsers, Groups, and RolesNames of Users and RolesCreating/Dropping/Using RolesCreate/Drop RoleGrant/Revoke RolesViewing Granted ...
阅读全文
摘要:DescribeDescribe DatabaseDescribe Table/View/ColumnDisplay Column StatisticsDescribe PartitionDescribe DatabaseVersion informationIconAs of Hive 0.7.D...
阅读全文
摘要:Create/Drop/Grant/Revoke Roles and PrivilegesHive Default Authorization - Legacy Modehas information about these DDL statements:CREATE ROLEGRANT ROLER...
阅读全文
摘要:Create/Drop/Alter ViewCreate ViewDrop ViewAlter View PropertiesAlter View As SelectVersion informationIconView support is only available in Hive 0.6 a...
阅读全文
摘要:Alter Table/Partition/ColumnAlter TableRename TableAlter Table PropertiesAlter Table CommentAdd SerDe PropertiesAlter Table Storage PropertiesAddition...
阅读全文
摘要:Hive Data Definition LanguageHive Data Definition LanguageOverviewCreate/Drop/Alter DatabaseCreate/Drop/Truncate TableAlter Table/Partition/ColumnCrea...
阅读全文
摘要:Querying and Inserting DataSimple QueryPartition Based QueryJoinsAggregationsMulti Table/File InsertsDynamic-Partition InsertInserting into Local File...
阅读全文
摘要:Creating, Showing, Altering, and Dropping TablesSeeHive Data Definition Languagefor detailed information about creating, showing, altering, and droppi...
阅读全文
摘要:Built-in OperatorsRelational OperatorsThe following operators compare the passed operands and generate a TRUE or FALSE value depending on whether the ...
阅读全文
摘要:数据类型Type SystemHive supports primitive and complex data types, as described below. SeeHive Data Typesfor additional information.Hive支持原生和复杂数据类型。Primit...
阅读全文
摘要:Data UnitsIn the order of granularity - Hive data is organized into:数据库、表、分区、桶Databases: Namespaces that separate tables and other data units from nam...
阅读全文
摘要:primitive adj. 原始的; 发展水平低的; 落后的; [生物学]原生的n. 原始人; 早期的艺术家(作品); 单纯的人:不世故的人; 自学的艺术家【网络释义】原始的; 原始; 原语; 早期的复数:primitives比较级:more primitive最高级:most primitive...
阅读全文
摘要:What Is HiveHive is a data warehousing infrastructure based onHadoop. Hadoop provides massive scale out and fault tolerance capabilities for data stor...
阅读全文
摘要:#查看本机mysql 安装路径[hadoop@SY-0134 toolkit]$ rpm -qa|grep -i mysql[hadoop@SY-0134 toolkit]$ whereis mysqlmysql: /usr/lib/mysql /usr/share/mysql环境Centos, 经...
阅读全文
摘要:package com.hive.jdbc;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.SQLException;import java.sql....
阅读全文
摘要:准备工作:1、笔记本4G内存 ,操作系统WIN72、工具VMware Workstation3、虚拟机:CentOS6.4共五台4、搭建好Hadoop集群( 方便Spark可从HDSF上读取文件,进行实验测试)实验环境:Hadoop HA集群:Iphostnamerole192.168.249.13...
阅读全文
摘要:1. Spark中的基本概念Application:基于Spark的用户程序,包含了一个driver program和集群中多个executor。Driver Program:运行Application的main()函数并创建SparkContext。通常SparkContext代表driver p...
阅读全文
摘要:准备工作:1、笔记本4G内存 ,操作系统WIN7 (屌丝的配置)2、工具VMware Workstation3、虚拟机:CentOS6.4共四台虚拟机设置:每台机器:内存512M,硬盘40G,网络适配器:NAT模式选择高级,新生成虚机Mac地址(克隆虚拟机,Mac地址不会改变,每次最后手动重新生成)...
阅读全文