摘要: 使用Spark和Scala分析Apache访问日志http://www.jdon.com/bigdata/analyzing-apache-access-logs-files-spark-scala.html 阅读全文
posted @ 2014-05-12 17:03 GrantYu 阅读(285) 评论(0) 推荐(0) 编辑
摘要: http://blog.csdn.net/pelick/article/category/1556747http://www.cnblogs.com/hseagle/ 阅读全文
posted @ 2014-05-12 17:01 GrantYu 阅读(259) 评论(0) 推荐(0) 编辑
摘要: Spark探秘:利用Intellij IDEA构建开发环境 阅读全文
posted @ 2014-05-12 16:51 GrantYu 阅读(190) 评论(0) 推荐(0) 编辑
摘要: 准备工作:注意 spark-0.9.1 要求 scala-2.10.x 版本,sbt-0.12.4版本。 centos 6.4 x64 系统,java 1.7.0 x64 1,安装 scala-2.10.x 2, 安装sbt-0.12.4 download rpm, http://www.scala... 阅读全文
posted @ 2014-05-12 14:34 GrantYu 阅读(300) 评论(0) 推荐(0) 编辑
摘要: 图解GIT,ZThttp://nettedfish.sinaapp.com/blog/2013/08/05/deep-into-git-with-diagrams/ 阅读全文
posted @ 2014-05-05 10:53 GrantYu 阅读(107) 评论(0) 推荐(0) 编辑
摘要: 三台 服务器 n0,n2,n3 centos 6.4 X64 JDK, SCALA 2.11 Hadoop 2.2.0 spark-0.9.1-bin-hadoop2.tgz 说明: 1.所有机器上安装scala 2.所有机器上安装spark,可从master机器配置好,用scp 复制到剩余节点. 阅读全文
posted @ 2014-04-24 17:10 GrantYu 阅读(484) 评论(0) 推荐(0) 编辑
摘要: 倒排索引(Inverted Index)倒排索引是一种索引结构,它存储了单词与单词自身在一个或多个文档中所在位置之间的映射。倒排索引通常利用关联数组实现。它拥有两种表现形式:inverted file index,其表现形式为 {词项,词项所在文档的ID}full inverted index,其表... 阅读全文
posted @ 2014-04-16 17:22 GrantYu 阅读(1276) 评论(0) 推荐(1) 编辑
摘要: Creating a Hadoop-2.x project in Eclipsehttp://snap.stanford.edu/class/cs246-data-2014/hw0.pdfHadoop WordCount with new map reduce apihttp://codesfusion.blogspot.com/2013/10/hadoop-wordcount-with-new-map-reduce-api.html 阅读全文
posted @ 2014-04-09 23:58 GrantYu 阅读(243) 评论(0) 推荐(0) 编辑
摘要: IntelliJ Project for Building Hadoop – The Definitive Guide Exampleshttp://vichargrave.com/intellij-project-for-building-hadoop-the-definitive-guide-examples/ 阅读全文
posted @ 2014-04-09 22:51 GrantYu 阅读(171) 评论(0) 推荐(0) 编辑
摘要: Creating a Hadoop-2.x project in Eclipsehortonworks:MapReduce Portshttp://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.0/bk_reference/content/reference_chap2_2.htmlhadoop-1.x 集群默认配置和常用配置http://www.cnblogs.com/ggjucheng/archive/2012/04/17/2454590.htmlEclipse下搭建Hadoop-2.x开发环境{good}http://blog.csdn.n 阅读全文
posted @ 2014-04-09 19:16 GrantYu 阅读(367) 评论(0) 推荐(0) 编辑
摘要: Create a Hadoop Build and Development Environmenthttp://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/Debugging Hadoop Applications with IntelliJhttp://vichargrave.com/debugging-hadoop-applications-with-intellij/ 阅读全文
posted @ 2014-04-07 15:31 GrantYu 阅读(154) 评论(0) 推荐(0) 编辑
摘要: Hadoop-2.3.0的Eclipse插件编译#cd /usr/local/src/hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin#ant jar -Dversion=2.3.0 -Declipse.home=/usr/local/eclipse -Dhadoop.home=/home/hm/hadoop编译很简单:经常出现的问题。因为Proxy问题不能获得ivy-2.1.0.jar,需要设置代理Can't get http://repo2.maven.org/maven2/org/apache/ivy/ivy/2. 阅读全文
posted @ 2014-04-03 15:43 GrantYu 阅读(1408) 评论(3) 推荐(0) 编辑
摘要: Storm集群安装部署步骤【详细版】假设1.已安装jdk,python,unzip 2.已经搭建Zookeeper集群;1. 安装Storm依赖库;需要在Nimbus和Supervisor机器上安装Storm的依赖库 1.1 ZeroMQ $./configure $make $sudo make install 1.2 JZMQ $./autogen.sh $./configure $make $sudo make install ZMQ和JZMQ默认安装在/usr/local/lib 下2. 下载并解压Storm发布版本https://github.com/nathanmarz/storm 阅读全文
posted @ 2014-04-03 11:48 GrantYu 阅读(236) 评论(0) 推荐(0) 编辑
摘要: How-to: Use HBase Bulk Loading, and Whyhttp://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/ 阅读全文
posted @ 2014-04-03 11:47 GrantYu 阅读(125) 评论(0) 推荐(0) 编辑
摘要: Hbase分布式安装Hbase分布式安装hbase-0.98.0-hadoop2-bin.tar前提是已经安装好 Hadoop,zookeeperhadoop port9000zookeeper port 2181 , dir/var/lib/zookeeper[hm@n0 ~]$ tar -zxv... 阅读全文
posted @ 2014-04-03 11:46 GrantYu 阅读(273) 评论(0) 推荐(0) 编辑