解惑:在Ubuntu18.04.2的idea上运行Scala支持的spark程序遇到的问题
解惑:在Ubuntu18.04.2的idea上运行Scala支持的spark程序遇到的问题
一、前言
最近在做一点小的实验,用到了Scala,spark这些东西,于是在Linux平台上来完成,结果一个最简单的入门程序搞了一两天,出了汗颜之外,对于这些工具的难用性也有了深刻的认知,难怪Hadoop的几个公司会渐渐走向衰落。
二、解惑
如果大家看过我之前的博客就知道,我是用过Hadoop,spark的,当时就遇到了非常多的麻烦,这些产品迭代的比较快,每个版本对于之前的兼容性可以说是微乎其微,因此版本的选用非常重要,除了在官网上看这些版本匹配的知识之外,网上很少涉及到这些东西的,但是这些东西却是非常重要的。而且这些产品安装起来也比较麻烦,下载下来,虽说是开箱即用,但是也需要对于里面的一些配置文件进行一些修改,这些都不算什么,当我们在命令行上运行的时候,却发现出现莫名其妙的错误,这些错误多与底层的Java版本,Hadoop版本,Scala版本等等有关,真的是让人很郁闷,但是产品做的也不好没有一些正确的提示,于是在网上找资料,但是发现能找到的非常少,往往是南辕北辙,自相矛盾,最后没有个一两天是很难找到最终的解决办法的。这些产品如果不改进,和那些MySQL,mongodb相比绝对是会被淘汰的。在本次小测试中,我就遇到了因为版本依赖问题而停工两天的问题,那就是在Ubuntu18.04.2的idea上运行Scala支持的spark程序,遇到的奇葩的问题。
先介绍一下我是怎么一步步来构建程序的,网上有不少案例,但是都是浅尝辄止,语焉不详,这些人是不配写文章的,没有一点敬畏心和责任感,搞出来的东西是把很多最重要的细节直接忽略,不知道是缺乏表达能力还是不屑为之。首先就是创建什么样的工程,支持Scala的程序,在idea中可以有两种方法,一种是直接创建Scala工程,这样首先需要安装Scala插件,其次在创建工程之后需要自己配置程序运行的环境,这些环境盘根错节,配置起来可能需要很多次尝试,最终浪费大量的精力;第二种方式还是要安装Scala插件,但是创建maven工程,在pom.xml文件中导入需要的配置,根据依赖和继承关系自动下载,并且导入Scala插件即可,显然第二种更简单一点。于是我们用第二种方式,构建maven工程。
创建新的文件夹,并且在程序结构中设置为我们的源文件文件夹。
最后我们需要引入我们下载的Scala插件,这个时候就涉及到版本问题了,Scala2.10之前支持Java7,2.11之后不支持Java7,而是Java8了,我们用的Java8,那么至少也是2.11,而2.11有很多版本,我们需要去选择一个。在这个界面,我们加入相应的Scala版本,但是这个版本可能没有,于是我们点击download按钮即可选择相应的版本下载,这里不得不吐槽一下idea实在是做的比较差的一点,下载需要半个小时时间,并且下载过程中没有进度条,让人非常的不耐,关闭也非常的麻烦。下载之后我们选择相应的版本。
到了这一点就需要在pom.xml中进行配置了,因为用到spark里面的机器学习插件,我们引入即可,因为包之间的依赖关系,maven自动帮我们搞定依赖关系,值得称赞。
1 <properties> 2 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 3 <maven.compiler.source>1.7</maven.compiler.source> 4 <maven.compiler.target>1.7</maven.compiler.target> 5 <!-- <scala.version>2.11.0</scala.version>--> 6 <spark.artifactID.suffix>2.11</spark.artifactID.suffix> 7 <spark.version>2.4.3</spark.version> 8 </properties> 9 <dependencies> 10 <dependency> 11 <groupId>junit</groupId> 12 <artifactId>junit</artifactId> 13 <version>4.11</version> 14 <scope>test</scope> 15 </dependency> 16 <dependency> 17 <groupId>org.apache.spark</groupId> 18 <artifactId>spark-mllib_${spark.artifactID.suffix}</artifactId> 19 <version>${spark.version}</version> 20 </dependency> 21 </dependencies>
这里我们版本设置成Scala2.11,对应于刚刚的下载,如果用2.12不知道怎么的,明明导入了依赖关系,总是连程序都出现问题,说找不到相应的包,而我在下载的依赖中明明就发现了这些文件,真的是让人惊讶!!!后来好不容易找到了,运行的时候却发现对于出现奇葩的异常,运行个程序真的是难呀,我们的Scala的hello程序竟然都难到这种程度了,版本问题造成的错误可以说是很奇葩了,spark按照maven仓库里面来尝试,我选的是最新版2.4.3。因为这是我目前可以运行的配置,所以是暂时没问题的。有的时候更奇葩的是第一次运行成功了,第二次再运行另一个程序失败了,第三次再来运行第一次的程序也出现了一样的问题,把idea的缓存都清了一次重启了很多次,依然存在这些问题,在另一台电脑上操作还是这样的问题,你说让不让人绝望?!最终暂时探索的一个可行的版本关联配置是Java8+Scala2.11.0+sparkmlib2.11+spark2.4.3,至此问题解决。
第一个程序:
1 package com.kmeans 2 3 import org.apache.spark.{SparkConf, SparkContext} 4 5 6 object MyTest { 7 def main(args:Array[String]): Unit = { 8 val logFile="file:///home/zyr/file.txt" 9 val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]") 10 val sc=new SparkContext(conf) 11 val logData=sc.textFile(logFile,2).cache() 12 val num=logData.flatMap(x=>x.split(" ")).filter(_.contains("a")).count() 13 println("Words with a : %s".format(num)) 14 sc.stop() 15 } 16 }
文件:
xyr a b c d f g a d f g
a a a a a a a a a
w e r t y yuu
运行结果:
1 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -javaagent:/usr/local/idea/lib/idea_rt.jar=44451:/usr/local/idea/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/icedtea-sound.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/java-atk-wrapper.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar:/home/zyr/IdeaProjects/myspark/target/classes:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.0/scala-reflect-2.11.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.0/scala-library-2.11.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib_2.11/2.4.3/spark-mllib_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.11/1.1.0/scala-parser-combinators_2.11-1.1.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.12/scala-library-2.11.12.jar:/home/zyr/.m2/repository/org/apache/spark/spark-core_2.11/2.4.3/spark-core_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/home/zyr/.m2/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/zyr/.m2/repository/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/home/zyr/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/zyr/.m2/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/home/zyr/.m2/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/home/zyr/.m2/repository/commons-codec/commons-codec/1.9/commons-codec-1.9.jar:/home/zyr/.m2/repository/com/twitter/chill_2.11/0.9.3/chill_2.11-0.9.3.jar:/home/zyr/.m2/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/home/zyr/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/zyr/.m2/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/home/zyr/.m2/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/home/zyr/.m2/repository/org/apache/xbean/xbean-asm6-shaded/4.8/xbean-asm6-shaded-4.8.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-client/2.6.5/hadoop-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-common/2.6.5/hadoop-common-2.6.5.jar:/home/zyr/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/zyr/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/zyr/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/zyr/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/zyr/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/zyr/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/zyr/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/zyr/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/zyr/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar:/home/zyr/.m2/repository/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.6.5/hadoop-hdfs-2.6.5.jar:/home/zyr/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/zyr/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/zyr/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.5/hadoop-mapreduce-client-app-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.5/hadoop-mapreduce-client-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.6.5/hadoop-yarn-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.6.5/hadoop-yarn-server-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.5/hadoop-mapreduce-client-shuffle-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.6.5/hadoop-yarn-api-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.5/hadoop-mapreduce-client-core-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.6.5/hadoop-yarn-common-2.6.5.jar:/home/zyr/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/zyr/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.5/hadoop-mapreduce-client-jobclient-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-annotations/2.6.5/hadoop-annotations-2.6.5.jar:/home/zyr/.m2/repository/org/apache/spark/spark-launcher_2.11/2.4.3/spark-launcher_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-kvstore_2.11/2.4.3/spark-kvstore_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.6.7/jackson-core-2.6.7.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-common_2.11/2.4.3/spark-network-common_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-shuffle_2.11/2.4.3/spark-network-shuffle_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.4.3/spark-unsafe_2.11-2.4.3.jar:/home/zyr/.m2/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/home/zyr/.m2/repository/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/home/zyr/.m2/repository/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.jar:/home/zyr/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/zyr/.m2/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/home/zyr/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/zyr/.m2/repository/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar:/home/zyr/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/zyr/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/home/zyr/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/zyr/.m2/repository/org/xerial/snappy/snappy-java/1.1.7.3/snappy-java-1.1.7.3.jar:/home/zyr/.m2/repository/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0.jar:/home/zyr/.m2/repository/com/github/luben/zstd-jni/1.3.2-2/zstd-jni-1.3.2-2.jar:/home/zyr/.m2/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/home/zyr/.m2/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/home/zyr/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/zyr/.m2/repository/org/json4s/json4s-jackson_2.11/3.5.3/json4s-jackson_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-core_2.11/3.5.3/json4s-core_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-ast_2.11/3.5.3/json4s-ast_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-scalap_2.11/3.5.3/json4s-scalap_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-xml_2.11/1.0.6/scala-xml_2.11-1.0.6.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-client/2.22.2/jersey-client-2.22.2.jar:/home/zyr/.m2/repository/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/javax.inject/2.4.0-b34/javax.inject-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-locator/2.4.0-b34/hk2-locator-2.4.0-b34.jar:/home/zyr/.m2/repository/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar:/home/zyr/.m2/repository/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.1/osgi-resource-locator-1.0.1.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-server/2.22.2/jersey-server-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.22.2/jersey-media-jaxb-2.22.2.jar:/home/zyr/.m2/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2.jar:/home/zyr/.m2/repository/io/netty/netty-all/4.1.17.Final/netty-all-4.1.17.Final.jar:/home/zyr/.m2/repository/io/netty/netty/3.9.9.Final/netty-3.9.9.Final.jar:/home/zyr/.m2/repository/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-core/3.1.5/metrics-core-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-jvm/3.1.5/metrics-jvm-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-graphite/3.1.5/metrics-graphite-3.1.5.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.6.7.1/jackson-databind-2.6.7.1.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.11/2.6.7.1/jackson-module-scala_2.11-2.6.7.1.jar:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.8/scala-reflect-2.11.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.7.9/jackson-module-paranamer-2.7.9.jar:/home/zyr/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/zyr/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/zyr/.m2/repository/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar:/home/zyr/.m2/repository/net/sf/py4j/py4j/0.10.7/py4j-0.10.7.jar:/home/zyr/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-streaming_2.11/2.4.3/spark-streaming_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sql_2.11/2.4.3/spark-sql_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/univocity/univocity-parsers/2.7.3/univocity-parsers-2.7.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sketch_2.11/2.4.3/spark-sketch_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-catalyst_2.11/2.4.3/spark-catalyst_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/codehaus/janino/janino/3.0.9/janino-3.0.9.jar:/home/zyr/.m2/repository/org/codehaus/janino/commons-compiler/3.0.9/commons-compiler-3.0.9.jar:/home/zyr/.m2/repository/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar:/home/zyr/.m2/repository/org/apache/orc/orc-core/1.5.5/orc-core-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/orc/orc-shims/1.5.5/orc-shims-1.5.5.jar:/home/zyr/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/zyr/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/zyr/.m2/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/home/zyr/.m2/repository/org/apache/orc/orc-mapreduce/1.5.5/orc-mapreduce-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-vector/0.10.0/arrow-vector-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-format/0.10.0/arrow-format-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-memory/0.10.0/arrow-memory-0.10.0.jar:/home/zyr/.m2/repository/joda-time/joda-time/2.9.9/joda-time-2.9.9.jar:/home/zyr/.m2/repository/com/carrotsearch/hppc/0.7.2/hppc-0.7.2.jar:/home/zyr/.m2/repository/com/vlkan/flatbuffers/1.2.0-3f79e055/flatbuffers-1.2.0-3f79e055.jar:/home/zyr/.m2/repository/org/apache/spark/spark-graphx_2.11/2.4.3/spark-graphx_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/home/zyr/.m2/repository/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib-local_2.11/2.4.3/spark-mllib-local_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scalanlp/breeze_2.11/0.13.2/breeze_2.11-0.13.2.jar:/home/zyr/.m2/repository/org/scalanlp/breeze-macros_2.11/0.13.2/breeze-macros_2.11-0.13.2.jar:/home/zyr/.m2/repository/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/home/zyr/.m2/repository/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/home/zyr/.m2/repository/org/spire-math/spire_2.11/0.13.0/spire_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/spire-math/spire-macros_2.11/0.13.0/spire-macros_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/typelevel/machinist_2.11/0.6.1/machinist_2.11-0.6.1.jar:/home/zyr/.m2/repository/com/chuusai/shapeless_2.11/2.3.2/shapeless_2.11-2.3.2.jar:/home/zyr/.m2/repository/org/typelevel/macro-compat_2.11/1.1.1/macro-compat_2.11-1.1.1.jar:/home/zyr/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-tags_2.11/2.4.3/spark-tags_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar com.kmeans.MyTest 2 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 3 19/07/10 11:36:47 WARN Utils: Your hostname, zyrpc resolves to a loopback address: 127.0.1.1; using 192.168.31.160 instead (on interface ens33) 4 19/07/10 11:36:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 5 19/07/10 11:36:47 INFO SparkContext: Running Spark version 2.4.3 6 19/07/10 11:36:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 7 19/07/10 11:36:50 INFO SparkContext: Submitted application: Simple Application 8 19/07/10 11:36:50 INFO SecurityManager: Changing view acls to: zyr 9 19/07/10 11:36:50 INFO SecurityManager: Changing modify acls to: zyr 10 19/07/10 11:36:50 INFO SecurityManager: Changing view acls groups to: 11 19/07/10 11:36:50 INFO SecurityManager: Changing modify acls groups to: 12 19/07/10 11:36:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zyr); groups with view permissions: Set(); users with modify permissions: Set(zyr); groups with modify permissions: Set() 13 19/07/10 11:36:52 INFO Utils: Successfully started service 'sparkDriver' on port 41147. 14 19/07/10 11:36:52 INFO SparkEnv: Registering MapOutputTracker 15 19/07/10 11:36:52 INFO SparkEnv: Registering BlockManagerMaster 16 19/07/10 11:36:52 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17 19/07/10 11:36:52 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18 19/07/10 11:36:52 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-63b48034-1ffc-40fa-bb45-6c117cb0451b 19 19/07/10 11:36:52 INFO MemoryStore: MemoryStore started with capacity 345.0 MB 20 19/07/10 11:36:52 INFO SparkEnv: Registering OutputCommitCoordinator 21 19/07/10 11:36:53 INFO Utils: Successfully started service 'SparkUI' on port 4040. 22 19/07/10 11:36:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.31.160:4040 23 19/07/10 11:36:54 INFO Executor: Starting executor ID driver on host localhost 24 19/07/10 11:36:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34263. 25 19/07/10 11:36:54 INFO NettyBlockTransferService: Server created on 192.168.31.160:34263 26 19/07/10 11:36:54 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 27 19/07/10 11:36:54 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.31.160, 34263, None) 28 19/07/10 11:36:54 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.31.160:34263 with 345.0 MB RAM, BlockManagerId(driver, 192.168.31.160, 34263, None) 29 19/07/10 11:36:54 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.31.160, 34263, None) 30 19/07/10 11:36:54 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.31.160, 34263, None) 31 19/07/10 11:36:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 344.8 MB) 32 19/07/10 11:36:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 344.8 MB) 33 19/07/10 11:36:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.31.160:34263 (size: 20.4 KB, free: 345.0 MB) 34 19/07/10 11:36:57 INFO SparkContext: Created broadcast 0 from textFile at MyTest.scala:11 35 19/07/10 11:36:58 INFO FileInputFormat: Total input paths to process : 1 36 19/07/10 11:36:58 INFO SparkContext: Starting job: count at MyTest.scala:12 37 19/07/10 11:36:58 INFO DAGScheduler: Got job 0 (count at MyTest.scala:12) with 2 output partitions 38 19/07/10 11:36:58 INFO DAGScheduler: Final stage: ResultStage 0 (count at MyTest.scala:12) 39 19/07/10 11:36:58 INFO DAGScheduler: Parents of final stage: List() 40 19/07/10 11:36:58 INFO DAGScheduler: Missing parents: List() 41 19/07/10 11:36:58 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at filter at MyTest.scala:12), which has no missing parents 42 19/07/10 11:36:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.7 KB, free 344.8 MB) 43 19/07/10 11:36:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.1 KB, free 344.8 MB) 44 19/07/10 11:36:58 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.31.160:34263 (size: 2.1 KB, free: 345.0 MB) 45 19/07/10 11:36:58 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161 46 19/07/10 11:36:58 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at filter at MyTest.scala:12) (first 15 tasks are for partitions Vector(0, 1)) 47 19/07/10 11:36:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 48 19/07/10 11:36:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7883 bytes) 49 19/07/10 11:36:58 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7883 bytes) 50 19/07/10 11:36:58 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 51 19/07/10 11:36:58 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 52 19/07/10 11:36:59 INFO HadoopRDD: Input split: file:/home/zyr/file.txt:0+29 53 19/07/10 11:36:59 INFO HadoopRDD: Input split: file:/home/zyr/file.txt:29+29 54 19/07/10 11:36:59 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 192.0 B, free 344.8 MB) 55 19/07/10 11:36:59 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 96.0 B, free 344.8 MB) 56 19/07/10 11:36:59 INFO BlockManagerInfo: Added rdd_1_1 in memory on 192.168.31.160:34263 (size: 96.0 B, free: 345.0 MB) 57 19/07/10 11:36:59 INFO BlockManagerInfo: Added rdd_1_0 in memory on 192.168.31.160:34263 (size: 192.0 B, free: 345.0 MB) 58 19/07/10 11:36:59 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 875 bytes result sent to driver 59 19/07/10 11:36:59 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 875 bytes result sent to driver 60 19/07/10 11:36:59 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 635 ms on localhost (executor driver) (1/2) 61 19/07/10 11:36:59 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 593 ms on localhost (executor driver) (2/2) 62 19/07/10 11:36:59 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 63 19/07/10 11:36:59 INFO DAGScheduler: ResultStage 0 (count at MyTest.scala:12) finished in 1.203 s 64 19/07/10 11:36:59 INFO DAGScheduler: Job 0 finished: count at MyTest.scala:12, took 1.438576 s 65 Words with a : 11 66 19/07/10 11:36:59 INFO SparkUI: Stopped Spark web UI at http://192.168.31.160:4040 67 19/07/10 11:36:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 68 19/07/10 11:36:59 INFO MemoryStore: MemoryStore cleared 69 19/07/10 11:36:59 INFO BlockManager: BlockManager stopped 70 19/07/10 11:36:59 INFO BlockManagerMaster: BlockManagerMaster stopped 71 19/07/10 11:36:59 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 72 19/07/10 11:36:59 INFO SparkContext: Successfully stopped SparkContext 73 19/07/10 11:36:59 INFO ShutdownHookManager: Shutdown hook called 74 19/07/10 11:36:59 INFO ShutdownHookManager: Deleting directory /tmp/spark-e07b1de0-0ac0-4abe-952a-504c2c7282fd 75 76 Process finished with exit code 0
第二个程序:
1 package com.kmeans 2 3 import org.apache.spark.mllib.clustering.KMeans 4 import org.apache.spark.mllib.linalg.Vectors 5 import org.apache.spark.{SparkConf, SparkContext} 6 7 8 /** 9 Scala版K近邻算法获取三维空间点中数据的归属 10 * **************** 11 * 测试数据(x,y,z) * 12 * *************** 13 * 0.0 0.0 0.0 14 * 0.1 0.1 0.1 15 * 0.2 0.2 0.2 16 * 9.0 9.0 9.0 17 * 9.1 9.1 9.1 18 * 9.2 9.2 9.2 19 */ 20 object Kmeans { 21 def main(args: Array[String]): Unit = { 22 23 val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]") 24 val context=new SparkContext(conf) 25 val dataSourceRDD = context.textFile("file:///home/zyr/kmeanstest.txt").cache() 26 val trainRDD = dataSourceRDD.map(lines => Vectors.dense(lines.split(" ").map(_.toDouble))) 27 // trainRDD.foreach(trainRow => println(trainRow) 28 // trainRDD.foreach(println) 29 // 训练数据得到模型 30 // 参数一:训练数据(Vectors类型的RDD) 31 // 参数二:中心簇数量 0 ~ n 32 // 参数三:代次数 33 val model = KMeans.train(trainRDD, 3, 30) 34 35 // 获取数据模型的中心点 36 val clustercenters = model.clusterCenters 37 38 // 打印数据模型的中心点 39 clustercenters.foreach(println) 40 41 //计算误差 42 val cross = model.computeCost(trainRDD) 43 println("误差为:" + cross) 44 45 // 使用模型匹配测试数据获取预测结果 46 val res1 = model.predict(Vectors.dense("0.2 0.2 0.2".split(' ').map(_.toDouble))) 47 val res2 = model.predict(Vectors.dense("0.25 0.25 0.25".split(' ').map(_.toDouble))) 48 val res3 = model.predict(Vectors.dense("0.1 0.1 0.1".split(' ').map(_.toDouble))) 49 val res4 = model.predict(Vectors.dense("9 9 9".split(' ').map(_.toDouble))) 50 val res5 = model.predict(Vectors.dense("9.1 9.1 9.1".split(' ').map(_.toDouble))) 51 val res6 = model.predict(Vectors.dense("9.06 9.06 9.06".split(' ').map(_.toDouble))) 52 // println("预测结果为:\r\n" + res1 + "\r\n" + res2 + "\r\n" + res3 + "\r\n" + res4 + "\r\n" + res5 + "\r\n" + res6) 53 /** 54 * 这是三个中心点 55 * [9.1,9.1,9.1] 56 * [0.05,0.05,0.05] 57 * [0.2,0.2,0.2] 58 * 以下为类簇值 59 * 2 60 * 2 61 * 1 62 * 0 63 * 0 64 * 0 65 * 此处结果可以看出输入数据与中心点更靠近的话就属于哪一个簇 66 */ 67 // 使用原数据进行交叉评估预测 68 val crossPredictRes = dataSourceRDD.map{ 69 lines => 70 val lineVectors = Vectors.dense(lines.split(" ").map(_.toDouble)) 71 val predictRes = model.predict(lineVectors) 72 lineVectors + "==>" + predictRes 73 } 74 crossPredictRes.foreach(println) 75 76 /** 77 * [9.0,9.0,9.0]==>0 78 * [9.1,9.1,9.1]==>0 79 * [9.2,9.2,9.2]==>0 80 * [0.0,0.0,0.0]==>1 81 * [0.1,0.1,0.1]==>1 82 * [0.2,0.2,0.2]==>2 83 * 84 */ 85 } 86 }
文件:
1 0.0 0.0 0.0 2 0.1 0.1 0.1 3 0.2 0.2 0.2 4 9.0 9.0 9.0 5 9.1 9.1 9.1 6 9.2 9.2 9.2
运行结果:
1 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -javaagent:/usr/local/idea/lib/idea_rt.jar=34781:/usr/local/idea/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/icedtea-sound.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/java-atk-wrapper.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar:/home/zyr/IdeaProjects/myspark/target/classes:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.0/scala-reflect-2.11.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.0/scala-library-2.11.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib_2.11/2.4.3/spark-mllib_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.11/1.1.0/scala-parser-combinators_2.11-1.1.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.12/scala-library-2.11.12.jar:/home/zyr/.m2/repository/org/apache/spark/spark-core_2.11/2.4.3/spark-core_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/home/zyr/.m2/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/zyr/.m2/repository/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/home/zyr/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/zyr/.m2/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/home/zyr/.m2/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/home/zyr/.m2/repository/commons-codec/commons-codec/1.9/commons-codec-1.9.jar:/home/zyr/.m2/repository/com/twitter/chill_2.11/0.9.3/chill_2.11-0.9.3.jar:/home/zyr/.m2/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/home/zyr/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/zyr/.m2/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/home/zyr/.m2/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/home/zyr/.m2/repository/org/apache/xbean/xbean-asm6-shaded/4.8/xbean-asm6-shaded-4.8.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-client/2.6.5/hadoop-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-common/2.6.5/hadoop-common-2.6.5.jar:/home/zyr/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/zyr/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/zyr/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/zyr/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/zyr/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/zyr/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/zyr/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/zyr/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/zyr/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar:/home/zyr/.m2/repository/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.6.5/hadoop-hdfs-2.6.5.jar:/home/zyr/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/zyr/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/zyr/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.5/hadoop-mapreduce-client-app-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.5/hadoop-mapreduce-client-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.6.5/hadoop-yarn-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.6.5/hadoop-yarn-server-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.5/hadoop-mapreduce-client-shuffle-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.6.5/hadoop-yarn-api-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.5/hadoop-mapreduce-client-core-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.6.5/hadoop-yarn-common-2.6.5.jar:/home/zyr/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/zyr/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.5/hadoop-mapreduce-client-jobclient-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-annotations/2.6.5/hadoop-annotations-2.6.5.jar:/home/zyr/.m2/repository/org/apache/spark/spark-launcher_2.11/2.4.3/spark-launcher_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-kvstore_2.11/2.4.3/spark-kvstore_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.6.7/jackson-core-2.6.7.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-common_2.11/2.4.3/spark-network-common_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-shuffle_2.11/2.4.3/spark-network-shuffle_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.4.3/spark-unsafe_2.11-2.4.3.jar:/home/zyr/.m2/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/home/zyr/.m2/repository/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/home/zyr/.m2/repository/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.jar:/home/zyr/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/zyr/.m2/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/home/zyr/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/zyr/.m2/repository/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar:/home/zyr/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/zyr/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/home/zyr/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/zyr/.m2/repository/org/xerial/snappy/snappy-java/1.1.7.3/snappy-java-1.1.7.3.jar:/home/zyr/.m2/repository/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0.jar:/home/zyr/.m2/repository/com/github/luben/zstd-jni/1.3.2-2/zstd-jni-1.3.2-2.jar:/home/zyr/.m2/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/home/zyr/.m2/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/home/zyr/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/zyr/.m2/repository/org/json4s/json4s-jackson_2.11/3.5.3/json4s-jackson_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-core_2.11/3.5.3/json4s-core_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-ast_2.11/3.5.3/json4s-ast_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-scalap_2.11/3.5.3/json4s-scalap_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-xml_2.11/1.0.6/scala-xml_2.11-1.0.6.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-client/2.22.2/jersey-client-2.22.2.jar:/home/zyr/.m2/repository/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/javax.inject/2.4.0-b34/javax.inject-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-locator/2.4.0-b34/hk2-locator-2.4.0-b34.jar:/home/zyr/.m2/repository/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar:/home/zyr/.m2/repository/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.1/osgi-resource-locator-1.0.1.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-server/2.22.2/jersey-server-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.22.2/jersey-media-jaxb-2.22.2.jar:/home/zyr/.m2/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2.jar:/home/zyr/.m2/repository/io/netty/netty-all/4.1.17.Final/netty-all-4.1.17.Final.jar:/home/zyr/.m2/repository/io/netty/netty/3.9.9.Final/netty-3.9.9.Final.jar:/home/zyr/.m2/repository/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-core/3.1.5/metrics-core-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-jvm/3.1.5/metrics-jvm-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-graphite/3.1.5/metrics-graphite-3.1.5.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.6.7.1/jackson-databind-2.6.7.1.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.11/2.6.7.1/jackson-module-scala_2.11-2.6.7.1.jar:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.8/scala-reflect-2.11.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.7.9/jackson-module-paranamer-2.7.9.jar:/home/zyr/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/zyr/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/zyr/.m2/repository/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar:/home/zyr/.m2/repository/net/sf/py4j/py4j/0.10.7/py4j-0.10.7.jar:/home/zyr/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-streaming_2.11/2.4.3/spark-streaming_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sql_2.11/2.4.3/spark-sql_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/univocity/univocity-parsers/2.7.3/univocity-parsers-2.7.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sketch_2.11/2.4.3/spark-sketch_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-catalyst_2.11/2.4.3/spark-catalyst_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/codehaus/janino/janino/3.0.9/janino-3.0.9.jar:/home/zyr/.m2/repository/org/codehaus/janino/commons-compiler/3.0.9/commons-compiler-3.0.9.jar:/home/zyr/.m2/repository/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar:/home/zyr/.m2/repository/org/apache/orc/orc-core/1.5.5/orc-core-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/orc/orc-shims/1.5.5/orc-shims-1.5.5.jar:/home/zyr/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/zyr/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/zyr/.m2/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/home/zyr/.m2/repository/org/apache/orc/orc-mapreduce/1.5.5/orc-mapreduce-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-vector/0.10.0/arrow-vector-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-format/0.10.0/arrow-format-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-memory/0.10.0/arrow-memory-0.10.0.jar:/home/zyr/.m2/repository/joda-time/joda-time/2.9.9/joda-time-2.9.9.jar:/home/zyr/.m2/repository/com/carrotsearch/hppc/0.7.2/hppc-0.7.2.jar:/home/zyr/.m2/repository/com/vlkan/flatbuffers/1.2.0-3f79e055/flatbuffers-1.2.0-3f79e055.jar:/home/zyr/.m2/repository/org/apache/spark/spark-graphx_2.11/2.4.3/spark-graphx_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/home/zyr/.m2/repository/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib-local_2.11/2.4.3/spark-mllib-local_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scalanlp/breeze_2.11/0.13.2/breeze_2.11-0.13.2.jar:/home/zyr/.m2/repository/org/scalanlp/breeze-macros_2.11/0.13.2/breeze-macros_2.11-0.13.2.jar:/home/zyr/.m2/repository/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/home/zyr/.m2/repository/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/home/zyr/.m2/repository/org/spire-math/spire_2.11/0.13.0/spire_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/spire-math/spire-macros_2.11/0.13.0/spire-macros_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/typelevel/machinist_2.11/0.6.1/machinist_2.11-0.6.1.jar:/home/zyr/.m2/repository/com/chuusai/shapeless_2.11/2.3.2/shapeless_2.11-2.3.2.jar:/home/zyr/.m2/repository/org/typelevel/macro-compat_2.11/1.1.1/macro-compat_2.11-1.1.1.jar:/home/zyr/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-tags_2.11/2.4.3/spark-tags_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar com.kmeans.Kmeans 2 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 3 19/07/10 11:40:11 WARN Utils: Your hostname, zyrpc resolves to a loopback address: 127.0.1.1; using 192.168.31.160 instead (on interface ens33) 4 19/07/10 11:40:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 5 19/07/10 11:40:11 INFO SparkContext: Running Spark version 2.4.3 6 19/07/10 11:40:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 7 19/07/10 11:40:14 INFO SparkContext: Submitted application: Simple Application 8 19/07/10 11:40:15 INFO SecurityManager: Changing view acls to: zyr 9 19/07/10 11:40:15 INFO SecurityManager: Changing modify acls to: zyr 10 19/07/10 11:40:15 INFO SecurityManager: Changing view acls groups to: 11 19/07/10 11:40:15 INFO SecurityManager: Changing modify acls groups to: 12 19/07/10 11:40:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zyr); groups with view permissions: Set(); users with modify permissions: Set(zyr); groups with modify permissions: Set() 13 19/07/10 11:40:17 INFO Utils: Successfully started service 'sparkDriver' on port 45437. 14 19/07/10 11:40:17 INFO SparkEnv: Registering MapOutputTracker 15 19/07/10 11:40:18 INFO SparkEnv: Registering BlockManagerMaster 16 19/07/10 11:40:18 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17 19/07/10 11:40:18 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18 19/07/10 11:40:18 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-2d502b3d-b275-49f2-9660-e1310680f61d 19 19/07/10 11:40:18 INFO MemoryStore: MemoryStore started with capacity 345.0 MB 20 19/07/10 11:40:18 INFO SparkEnv: Registering OutputCommitCoordinator 21 19/07/10 11:40:20 INFO Utils: Successfully started service 'SparkUI' on port 4040. 22 19/07/10 11:40:20 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.31.160:4040 23 19/07/10 11:40:21 INFO Executor: Starting executor ID driver on host localhost 24 19/07/10 11:40:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44595. 25 19/07/10 11:40:22 INFO NettyBlockTransferService: Server created on 192.168.31.160:44595 26 19/07/10 11:40:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 27 19/07/10 11:40:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.31.160, 44595, None) 28 19/07/10 11:40:23 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.31.160:44595 with 345.0 MB RAM, BlockManagerId(driver, 192.168.31.160, 44595, None) 29 19/07/10 11:40:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.31.160, 44595, None) 30 19/07/10 11:40:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.31.160, 44595, None) 31 19/07/10 11:40:25 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 344.8 MB) 32 19/07/10 11:40:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 344.8 MB) 33 19/07/10 11:40:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.31.160:44595 (size: 20.4 KB, free: 345.0 MB) 34 19/07/10 11:40:26 INFO SparkContext: Created broadcast 0 from textFile at Kmeans.scala:25 35 19/07/10 11:40:26 WARN KMeans: The input data is not directly cached, which may hurt performance if its parent RDDs are also uncached. 36 19/07/10 11:40:26 INFO FileInputFormat: Total input paths to process : 1 37 19/07/10 11:40:26 INFO SparkContext: Starting job: takeSample at KMeans.scala:386 38 19/07/10 11:40:26 INFO DAGScheduler: Got job 0 (takeSample at KMeans.scala:386) with 2 output partitions 39 19/07/10 11:40:26 INFO DAGScheduler: Final stage: ResultStage 0 (takeSample at KMeans.scala:386) 40 19/07/10 11:40:26 INFO DAGScheduler: Parents of final stage: List() 41 19/07/10 11:40:26 INFO DAGScheduler: Missing parents: List() 42 19/07/10 11:40:26 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[5] at map at KMeans.scala:248), which has no missing parents 43 19/07/10 11:40:27 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.3 KB, free 344.8 MB) 44 19/07/10 11:40:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 344.8 MB) 45 19/07/10 11:40:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.31.160:44595 (size: 2.5 KB, free: 345.0 MB) 46 19/07/10 11:40:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161 47 19/07/10 11:40:27 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[5] at map at KMeans.scala:248) (first 15 tasks are for partitions Vector(0, 1)) 48 19/07/10 11:40:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 49 19/07/10 11:40:27 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 8200 bytes) 50 19/07/10 11:40:27 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 8200 bytes) 51 19/07/10 11:40:27 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 52 19/07/10 11:40:27 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 53 19/07/10 11:40:28 INFO HadoopRDD: Input split: file:/home/zyr/kmeanstest.txt:36+36 54 19/07/10 11:40:28 INFO HadoopRDD: Input split: file:/home/zyr/kmeanstest.txt:0+36 55 19/07/10 11:40:28 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 288.0 B, free 344.8 MB) 56 19/07/10 11:40:28 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 152.0 B, free 344.8 MB) 57 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_1_0 in memory on 192.168.31.160:44595 (size: 288.0 B, free: 345.0 MB) 58 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_1_1 in memory on 192.168.31.160:44595 (size: 152.0 B, free: 345.0 MB) 59 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_0 locally 60 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_1 locally 61 19/07/10 11:40:28 INFO MemoryStore: Block rdd_3_0 stored as values in memory (estimated size 48.0 B, free 344.8 MB) 62 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_3_0 in memory on 192.168.31.160:44595 (size: 48.0 B, free: 345.0 MB) 63 19/07/10 11:40:28 INFO MemoryStore: Block rdd_3_1 stored as values in memory (estimated size 32.0 B, free 344.8 MB) 64 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_3_1 in memory on 192.168.31.160:44595 (size: 32.0 B, free: 345.0 MB) 65 19/07/10 11:40:28 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 875 bytes result sent to driver 66 19/07/10 11:40:28 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 875 bytes result sent to driver 67 19/07/10 11:40:28 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 837 ms on localhost (executor driver) (1/2) 68 19/07/10 11:40:28 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 942 ms on localhost (executor driver) (2/2) 69 19/07/10 11:40:28 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 70 19/07/10 11:40:28 INFO DAGScheduler: ResultStage 0 (takeSample at KMeans.scala:386) finished in 1.635 s 71 19/07/10 11:40:28 INFO DAGScheduler: Job 0 finished: takeSample at KMeans.scala:386, took 1.961204 s 72 19/07/10 11:40:28 INFO SparkContext: Starting job: takeSample at KMeans.scala:386 73 19/07/10 11:40:28 INFO DAGScheduler: Got job 1 (takeSample at KMeans.scala:386) with 2 output partitions 74 19/07/10 11:40:28 INFO DAGScheduler: Final stage: ResultStage 1 (takeSample at KMeans.scala:386) 75 19/07/10 11:40:28 INFO DAGScheduler: Parents of final stage: List() 76 19/07/10 11:40:28 INFO DAGScheduler: Missing parents: List() 77 19/07/10 11:40:28 INFO DAGScheduler: Submitting ResultStage 1 (PartitionwiseSampledRDD[7] at takeSample at KMeans.scala:386), which has no missing parents 78 19/07/10 11:40:28 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 5.1 KB, free 344.8 MB) 79 19/07/10 11:40:28 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.9 KB, free 344.8 MB) 80 19/07/10 11:40:28 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.31.160:44595 (size: 2.9 KB, free: 345.0 MB) 81 19/07/10 11:40:28 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1161 82 19/07/10 11:40:28 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (PartitionwiseSampledRDD[7] at takeSample at KMeans.scala:386) (first 15 tasks are for partitions Vector(0, 1)) 83 19/07/10 11:40:28 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 84 19/07/10 11:40:28 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 8309 bytes) 85 19/07/10 11:40:28 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, PROCESS_LOCAL, 8309 bytes) 86 19/07/10 11:40:28 INFO Executor: Running task 0.0 in stage 1.0 (TID 2) 87 19/07/10 11:40:28 INFO Executor: Running task 1.0 in stage 1.0 (TID 3) 88 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_0 locally 89 19/07/10 11:40:28 INFO BlockManager: Found block rdd_3_0 locally 90 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_1 locally 91 19/07/10 11:40:28 INFO BlockManager: Found block rdd_3_1 locally 92 19/07/10 11:40:28 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1283 bytes result sent to driver 93 19/07/10 11:40:28 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1175 bytes result sent to driver 94 19/07/10 11:40:28 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 92 ms on localhost (executor driver) (1/2) 95 19/07/10 11:40:28 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 96 ms on localhost (executor driver) (2/2) 96 19/07/10 11:40:28 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 97 19/07/10 11:40:28 INFO DAGScheduler: ResultStage 1 (takeSample at KMeans.scala:386) finished in 0.132 s 98 19/07/10 11:40:28 INFO DAGScheduler: Job 1 finished: takeSample at KMeans.scala:386, took 0.153980 s 99 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 144.0 B, free 344.8 MB) 100 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 344.0 B, free 344.8 MB) 101 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.31.160:44595 (size: 344.0 B, free: 345.0 MB) 102 19/07/10 11:40:29 INFO SparkContext: Created broadcast 3 from broadcast at KMeans.scala:400 103 19/07/10 11:40:29 INFO SparkContext: Starting job: sum at KMeans.scala:406 104 19/07/10 11:40:29 INFO DAGScheduler: Got job 2 (sum at KMeans.scala:406) with 2 output partitions 105 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 2 (sum at KMeans.scala:406) 106 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List() 107 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List() 108 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at map at KMeans.scala:403), which has no missing parents 109 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 5.4 KB, free 344.7 MB) 110 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.0 KB, free 344.7 MB) 111 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.31.160:44595 (size: 3.0 KB, free: 345.0 MB) 112 19/07/10 11:40:29 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1161 113 19/07/10 11:40:29 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at map at KMeans.scala:403) (first 15 tasks are for partitions Vector(0, 1)) 114 19/07/10 11:40:29 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks 115 19/07/10 11:40:29 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, localhost, executor driver, partition 0, PROCESS_LOCAL, 8232 bytes) 116 19/07/10 11:40:29 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 8232 bytes) 117 19/07/10 11:40:29 INFO Executor: Running task 0.0 in stage 2.0 (TID 4) 118 19/07/10 11:40:29 INFO Executor: Running task 1.0 in stage 2.0 (TID 5) 119 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally 120 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally 121 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally 122 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally 123 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally 124 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally 125 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally 126 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally 127 19/07/10 11:40:29 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 128 19/07/10 11:40:29 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 129 19/07/10 11:40:29 INFO MemoryStore: Block rdd_9_1 stored as values in memory (estimated size 32.0 B, free 344.7 MB) 130 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_9_1 in memory on 192.168.31.160:44595 (size: 32.0 B, free: 345.0 MB) 131 19/07/10 11:40:29 INFO Executor: Finished task 1.0 in stage 2.0 (TID 5). 834 bytes result sent to driver 132 19/07/10 11:40:29 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 149 ms on localhost (executor driver) (1/2) 133 19/07/10 11:40:29 INFO MemoryStore: Block rdd_9_0 stored as values in memory (estimated size 48.0 B, free 344.7 MB) 134 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_9_0 in memory on 192.168.31.160:44595 (size: 48.0 B, free: 345.0 MB) 135 19/07/10 11:40:29 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4). 834 bytes result sent to driver 136 19/07/10 11:40:29 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 168 ms on localhost (executor driver) (2/2) 137 19/07/10 11:40:29 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 138 19/07/10 11:40:29 INFO DAGScheduler: ResultStage 2 (sum at KMeans.scala:406) finished in 0.199 s 139 19/07/10 11:40:29 INFO DAGScheduler: Job 2 finished: sum at KMeans.scala:406, took 0.221468 s 140 19/07/10 11:40:29 INFO MapPartitionsRDD: Removing RDD 6 from persistence list 141 19/07/10 11:40:29 INFO BlockManager: Removing RDD 6 142 19/07/10 11:40:29 INFO SparkContext: Starting job: collect at KMeans.scala:414 143 19/07/10 11:40:29 INFO DAGScheduler: Got job 3 (collect at KMeans.scala:414) with 2 output partitions 144 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 3 (collect at KMeans.scala:414) 145 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List() 146 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List() 147 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[11] at mapPartitionsWithIndex at KMeans.scala:411), which has no missing parents 148 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 6.1 KB, free 344.7 MB) 149 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.3 KB, free 344.7 MB) 150 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.31.160:44595 (size: 3.3 KB, free: 345.0 MB) 151 19/07/10 11:40:29 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1161 152 19/07/10 11:40:29 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (MapPartitionsRDD[11] at mapPartitionsWithIndex at KMeans.scala:411) (first 15 tasks are for partitions Vector(0, 1)) 153 19/07/10 11:40:29 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks 154 19/07/10 11:40:29 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, localhost, executor driver, partition 0, PROCESS_LOCAL, 8264 bytes) 155 19/07/10 11:40:29 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, localhost, executor driver, partition 1, PROCESS_LOCAL, 8264 bytes) 156 19/07/10 11:40:29 INFO Executor: Running task 0.0 in stage 3.0 (TID 6) 157 19/07/10 11:40:29 INFO Executor: Running task 1.0 in stage 3.0 (TID 7) 158 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally 159 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally 160 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_0 locally 161 19/07/10 11:40:29 INFO Executor: Finished task 0.0 in stage 3.0 (TID 6). 1078 bytes result sent to driver 162 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally 163 19/07/10 11:40:29 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 21 ms on localhost (executor driver) (1/2) 164 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally 165 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_1 locally 166 19/07/10 11:40:29 INFO Executor: Finished task 1.0 in stage 3.0 (TID 7). 1132 bytes result sent to driver 167 19/07/10 11:40:29 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 31 ms on localhost (executor driver) (2/2) 168 19/07/10 11:40:29 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 169 19/07/10 11:40:29 INFO DAGScheduler: ResultStage 3 (collect at KMeans.scala:414) finished in 0.061 s 170 19/07/10 11:40:29 INFO DAGScheduler: Job 3 finished: collect at KMeans.scala:414, took 0.084564 s 171 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 320.0 B, free 344.7 MB) 172 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 426.0 B, free 344.7 MB) 173 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 192.168.31.160:44595 (size: 426.0 B, free: 345.0 MB) 174 19/07/10 11:40:29 INFO SparkContext: Created broadcast 6 from broadcast at KMeans.scala:400 175 19/07/10 11:40:29 INFO SparkContext: Starting job: sum at KMeans.scala:406 176 19/07/10 11:40:29 INFO DAGScheduler: Got job 4 (sum at KMeans.scala:406) with 2 output partitions 177 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 4 (sum at KMeans.scala:406) 178 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List() 179 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List() 180 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[13] at map at KMeans.scala:403), which has no missing parents 181 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 5.7 KB, free 344.7 MB) 182 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.1 KB, free 344.7 MB) 183 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 192.168.31.160:44595 (size: 3.1 KB, free: 345.0 MB) 184 19/07/10 11:40:29 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1161 185 19/07/10 11:40:29 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 4 (MapPartitionsRDD[13] at map at KMeans.scala:403) (first 15 tasks are for partitions Vector(0, 1)) 186 19/07/10 11:40:29 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks 187 19/07/10 11:40:29 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 8, localhost, executor driver, partition 0, PROCESS_LOCAL, 8264 bytes) 188 19/07/10 11:40:29 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID 9, localhost, executor driver, partition 1, PROCESS_LOCAL, 8264 bytes) 189 19/07/10 11:40:29 INFO Executor: Running task 0.0 in stage 4.0 (TID 8) 190 19/07/10 11:40:29 INFO Executor: Running task 1.0 in stage 4.0 (TID 9) 191 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally 192 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally 193 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally 194 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_1 locally 195 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally 196 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_0 locally 197 19/07/10 11:40:29 INFO MemoryStore: Block rdd_13_1 stored as values in memory (estimated size 32.0 B, free 344.7 MB) 198 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_13_1 in memory on 192.168.31.160:44595 (size: 32.0 B, free: 345.0 MB) 199 19/07/10 11:40:29 INFO MemoryStore: Block rdd_13_0 stored as values in memory (estimated size 48.0 B, free 344.7 MB) 200 19/07/10 11:40:29 INFO Executor: Finished task 1.0 in stage 4.0 (TID 9). 834 bytes result sent to driver 201 19/07/10 11:40:29 INFO TaskSetManager: Finished task 1.0 in stage 4.0 (TID 9) in 64 ms on localhost (executor driver) (1/2) 202 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_13_0 in memory on 192.168.31.160:44595 (size: 48.0 B, free: 345.0 MB) 203 19/07/10 11:40:29 INFO Executor: Finished task 0.0 in stage 4.0 (TID 8). 834 bytes result sent to driver 204 19/07/10 11:40:29 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 8) in 75 ms on localhost (executor driver) (2/2) 205 19/07/10 11:40:29 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 206 19/07/10 11:40:29 INFO DAGScheduler: ResultStage 4 (sum at KMeans.scala:406) finished in 0.108 s 207 19/07/10 11:40:29 INFO DAGScheduler: Job 4 finished: sum at KMeans.scala:406, took 0.129273 s 208 19/07/10 11:40:29 INFO MapPartitionsRDD: Removing RDD 9 from persistence list 209 19/07/10 11:40:29 INFO BlockManager: Removing RDD 9 210 19/07/10 11:40:29 INFO SparkContext: Starting job: collect at KMeans.scala:414 211 19/07/10 11:40:29 INFO DAGScheduler: Got job 5 (collect at KMeans.scala:414) with 2 output partitions 212 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 5 (collect at KMeans.scala:414) 213 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List() 214 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List() 215 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[15] at mapPartitionsWithIndex at KMeans.scala:411), which has no missing parents 216 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 6.4 KB, free 344.7 MB) 217 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 3.4 KB, free 344.7 MB) 218 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 192.168.31.160:44595 (size: 3.4 KB, free: 345.0 MB) 219 19/07/10 11:40:30 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1161 220 19/07/10 11:40:30 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (MapPartitionsRDD[15] at mapPartitionsWithIndex at KMeans.scala:411) (first 15 tasks are for partitions Vector(0, 1)) 221 19/07/10 11:40:30 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks 222 19/07/10 11:40:30 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 10, localhost, executor driver, partition 0, PROCESS_LOCAL, 8296 bytes) 223 19/07/10 11:40:30 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 11, localhost, executor driver, partition 1, PROCESS_LOCAL, 8296 bytes) 224 19/07/10 11:40:30 INFO Executor: Running task 0.0 in stage 5.0 (TID 10) 225 19/07/10 11:40:30 INFO Executor: Running task 1.0 in stage 5.0 (TID 11) 226 19/07/10 11:40:30 INFO BlockManager: Found block rdd_1_1 locally 227 19/07/10 11:40:30 INFO BlockManager: Found block rdd_3_1 locally 228 19/07/10 11:40:30 INFO BlockManager: Found block rdd_1_0 locally 229 19/07/10 11:40:30 INFO BlockManager: Found block rdd_3_0 locally 230 19/07/10 11:40:30 INFO BlockManager: Found block rdd_13_0 locally 231 19/07/10 11:40:30 INFO Executor: Finished task 0.0 in stage 5.0 (TID 10). 1132 bytes result sent to driver 232 19/07/10 11:40:30 INFO BlockManager: Found block rdd_13_1 locally 233 19/07/10 11:40:30 INFO Executor: Finished task 1.0 in stage 5.0 (TID 11). 826 bytes result sent to driver 234 19/07/10 11:40:30 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 10) in 56 ms on localhost (executor driver) (1/2) 235 19/07/10 11:40:30 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 11) in 58 ms on localhost (executor driver) (2/2) 236 19/07/10 11:40:30 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 237 19/07/10 11:40:30 INFO DAGScheduler: ResultStage 5 (collect at KMeans.scala:414) finished in 0.147 s 238 19/07/10 11:40:30 INFO DAGScheduler: Job 5 finished: collect at KMeans.scala:414, took 0.178237 s 239 19/07/10 11:40:30 INFO MapPartitionsRDD: Removing RDD 13 from persistence list 240 19/07/10 11:40:30 INFO BlockManager: Removing RDD 13 241 19/07/10 11:40:30 INFO TorrentBroadcast: Destroying Broadcast(3) (from destroy at KMeans.scala:421) 242 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 27 243 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 42 244 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 51 245 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 124 246 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 32 247 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 72 248 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 47 249 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 84 250 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 36 251 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 73 252 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 46 253 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 25 254 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 130 255 19/07/10 11:40:30 INFO TorrentBroadcast: Destroying Broadcast(6) (from destroy at KMeans.scala:421) 256 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.31.160:44595 in memory (size: 344.0 B, free: 345.0 MB) 257 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.31.160:44595 in memory (size: 2.9 KB, free: 345.0 MB) 258 19/07/10 11:40:30 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 568.0 B, free 344.7 MB) 259 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_6_piece0 on 192.168.31.160:44595 in memory (size: 426.0 B, free: 345.0 MB) 260 19/07/10 11:40:30 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 529.0 B, free 344.7 MB) 261 19/07/10 11:40:30 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on 192.168.31.160:44595 (size: 529.0 B, free: 345.0 MB) 262 19/07/10 11:40:30 INFO SparkContext: Created broadcast 9 from broadcast at KMeans.scala:431 263 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 116 264 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 94 265 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 90 266 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 96 267 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 89 268 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 81 269 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 121 270 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 149 271 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 145 272 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 50 273 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_7_piece0 on 192.168.31.160:44595 in memory (size: 3.1 KB, free: 345.0 MB) 274 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 135 275 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 108 276 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_8_piece0 on 192.168.31.160:44595 in memory (size: 3.4 KB, free: 345.0 MB) 277 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 34 278 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 106 279 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 70 280 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 44 281 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 57 282 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 105 283 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 97 284 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 31 285 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 33 286 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 113 287 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 140 288 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 100 289 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 43 290 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 68 291 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 133 292 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 138 293 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 39 294 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 129 295 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 49 296 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 37 297 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 85 298 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 132 299 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 53 300 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 82 301 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 69 302 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 104 303 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 52 304 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 98 305 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 141 306 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 76 307 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 77 308 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 102 309 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 134 310 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 79 311 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 61 312 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 59 313 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 118 314 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 74 315 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 54 316 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 86 317 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 136 318 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 110 319 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 45 320 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 30 321 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 60 322 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 64 323 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 137 324 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 95 325 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 87 326 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 38 327 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 29 328 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 56 329 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 125 330 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 131 331 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 41 332 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 55 333 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 128 334 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 127 335 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 88 336 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 91 337 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 123 338 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 103 339 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 115 340 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 139 341 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 142 342 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 147 343 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_5_piece0 on 192.168.31.160:44595 in memory (size: 3.3 KB, free: 345.0 MB) 344 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 26 345 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 48 346 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 78 347 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 93 348 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 40 349 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 35 350 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 63 351 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 75 352 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 112 353 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 146 354 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 58 355 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 62 356 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 83 357 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 101 358 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 148 359 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 144 360 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 111 361 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 117 362 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 71 363 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_4_piece0 on 192.168.31.160:44595 in memory (size: 3.0 KB, free: 345.0 MB) 364 19/07/10 11:40:30 INFO SparkContext: Starting job: countByValue at KMeans.scala:434 365 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 28 366 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 122 367 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 143 368 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 65 369 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 126 370 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 92 371 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 114 372 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 120 373 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 80 374 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 107 375 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 109 376 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 99 377 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 66 378 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 67 379 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 119 380 19/07/10 11:40:31 INFO DAGScheduler: Registering RDD 18 (countByValue at KMeans.scala:434) 381 19/07/10 11:40:31 INFO DAGScheduler: Got job 6 (countByValue at KMeans.scala:434) with 2 output partitions 382 19/07/10 11:40:31 INFO DAGScheduler: Final stage: ResultStage 7 (countByValue at KMeans.scala:434) 383 19/07/10 11:40:31 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 6) 384 19/07/10 11:40:31 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 6) 385 19/07/10 11:40:31 INFO DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[18] at countByValue at KMeans.scala:434), which has no missing parents 386 19/07/10 11:40:31 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 6.7 KB, free 344.8 MB) 387 19/07/10 11:40:31 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 3.7 KB, free 344.8 MB) 388 19/07/10 11:40:31 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on 192.168.31.160:44595 (size: 3.7 KB, free: 345.0 MB) 389 19/07/10 11:40:31 INFO SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1161 390 19/07/10 11:40:31 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[18] at countByValue at KMeans.scala:434) (first 15 tasks are for partitions Vector(0, 1)) 391 19/07/10 11:40:31 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks 392 19/07/10 11:40:31 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 12, localhost, executor driver, partition 0, PROCESS_LOCAL, 8189 bytes) 393 19/07/10 11:40:31 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 13, localhost, executor driver, partition 1, PROCESS_LOCAL, 8189 bytes) 394 19/07/10 11:40:31 INFO Executor: Running task 0.0 in stage 6.0 (TID 12) 395 19/07/10 11:40:31 INFO Executor: Running task 1.0 in stage 6.0 (TID 13) 396 19/07/10 11:40:31 INFO BlockManager: Found block rdd_1_0 locally 397 19/07/10 11:40:31 INFO BlockManager: Found block rdd_3_0 locally 398 19/07/10 11:40:31 INFO BlockManager: Found block rdd_1_1 locally 399 19/07/10 11:40:31 INFO BlockManager: Found block rdd_3_1 locally 400 19/07/10 11:40:31 INFO Executor: Finished task 1.0 in stage 6.0 (TID 13). 1156 bytes result sent to driver 401 19/07/10 11:40:31 INFO Executor: Finished task 0.0 in stage 6.0 (TID 12). 1113 bytes result sent to driver 402 19/07/10 11:40:31 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 13) in 294 ms on localhost (executor driver) (1/2) 403 19/07/10 11:40:31 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 12) in 300 ms on localhost (executor driver) (2/2) 404 19/07/10 11:40:31 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 405 19/07/10 11:40:31 INFO DAGScheduler: ShuffleMapStage 6 (countByValue at KMeans.scala:434) finished in 0.367 s 406 19/07/10 11:40:31 INFO DAGScheduler: looking for newly runnable stages 407 19/07/10 11:40:31 INFO DAGScheduler: running: Set() 408 19/07/10 11:40:31 INFO DAGScheduler: waiting: Set(ResultStage 7) 409 19/07/10 11:40:31 INFO DAGScheduler: failed: Set() 410 19/07/10 11:40:32 INFO DAGScheduler: Submitting ResultStage 7 (ShuffledRDD[19] at countByValue at KMeans.scala:434), which has no missing parents 411 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 3.3 KB, free 344.7 MB) 412 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 2.0 KB, free 344.7 MB) 413 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on 192.168.31.160:44595 (size: 2.0 KB, free: 345.0 MB) 414 19/07/10 11:40:32 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1161 415 19/07/10 11:40:32 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (ShuffledRDD[19] at countByValue at KMeans.scala:434) (first 15 tasks are for partitions Vector(0, 1)) 416 19/07/10 11:40:32 INFO TaskSchedulerImpl: Adding task set 7.0 with 2 tasks 417 19/07/10 11:40:32 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 14, localhost, executor driver, partition 0, ANY, 7662 bytes) 418 19/07/10 11:40:32 INFO TaskSetManager: Starting task 1.0 in stage 7.0 (TID 15, localhost, executor driver, partition 1, ANY, 7662 bytes) 419 19/07/10 11:40:32 INFO Executor: Running task 0.0 in stage 7.0 (TID 14) 420 19/07/10 11:40:32 INFO Executor: Running task 1.0 in stage 7.0 (TID 15) 421 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks 422 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 19 ms 423 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks 424 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 15 ms 425 19/07/10 11:40:32 INFO Executor: Finished task 1.0 in stage 7.0 (TID 15). 1415 bytes result sent to driver 426 19/07/10 11:40:32 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID 15) in 290 ms on localhost (executor driver) (1/2) 427 19/07/10 11:40:32 INFO Executor: Finished task 0.0 in stage 7.0 (TID 14). 1372 bytes result sent to driver 428 19/07/10 11:40:32 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 14) in 308 ms on localhost (executor driver) (2/2) 429 19/07/10 11:40:32 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 430 19/07/10 11:40:32 INFO DAGScheduler: ResultStage 7 (countByValue at KMeans.scala:434) finished in 0.347 s 431 19/07/10 11:40:32 INFO DAGScheduler: Job 6 finished: countByValue at KMeans.scala:434, took 1.822957 s 432 19/07/10 11:40:32 INFO TorrentBroadcast: Destroying Broadcast(9) (from destroy at KMeans.scala:436) 433 19/07/10 11:40:32 INFO BlockManagerInfo: Removed broadcast_9_piece0 on 192.168.31.160:44595 in memory (size: 529.0 B, free: 345.0 MB) 434 19/07/10 11:40:32 INFO LocalKMeans: Local KMeans++ converged in 2 iterations. 435 19/07/10 11:40:32 INFO KMeans: Initialization with k-means|| took 5.977 seconds. 436 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 296.0 B, free 344.7 MB) 437 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 334.0 B, free 344.7 MB) 438 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on 192.168.31.160:44595 (size: 334.0 B, free: 345.0 MB) 439 19/07/10 11:40:32 INFO SparkContext: Created broadcast 12 from broadcast at KMeans.scala:299 440 19/07/10 11:40:32 INFO SparkContext: Starting job: collectAsMap at KMeans.scala:320 441 19/07/10 11:40:32 INFO DAGScheduler: Registering RDD 20 (mapPartitions at KMeans.scala:302) 442 19/07/10 11:40:32 INFO DAGScheduler: Got job 7 (collectAsMap at KMeans.scala:320) with 2 output partitions 443 19/07/10 11:40:32 INFO DAGScheduler: Final stage: ResultStage 9 (collectAsMap at KMeans.scala:320) 444 19/07/10 11:40:32 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 8) 445 19/07/10 11:40:32 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 8) 446 19/07/10 11:40:32 INFO DAGScheduler: Submitting ShuffleMapStage 8 (MapPartitionsRDD[20] at mapPartitions at KMeans.scala:302), which has no missing parents 447 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 6.3 KB, free 344.7 MB) 448 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 3.5 KB, free 344.7 MB) 449 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory on 192.168.31.160:44595 (size: 3.5 KB, free: 345.0 MB) 450 19/07/10 11:40:32 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1161 451 19/07/10 11:40:32 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 8 (MapPartitionsRDD[20] at mapPartitions at KMeans.scala:302) (first 15 tasks are for partitions Vector(0, 1)) 452 19/07/10 11:40:32 INFO TaskSchedulerImpl: Adding task set 8.0 with 2 tasks 453 19/07/10 11:40:32 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 16, localhost, executor driver, partition 0, PROCESS_LOCAL, 8189 bytes) 454 19/07/10 11:40:32 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID 17, localhost, executor driver, partition 1, PROCESS_LOCAL, 8189 bytes) 455 19/07/10 11:40:32 INFO Executor: Running task 0.0 in stage 8.0 (TID 16) 456 19/07/10 11:40:32 INFO Executor: Running task 1.0 in stage 8.0 (TID 17) 457 19/07/10 11:40:32 INFO BlockManager: Found block rdd_1_0 locally 458 19/07/10 11:40:32 INFO BlockManager: Found block rdd_3_0 locally 459 19/07/10 11:40:32 INFO BlockManager: Found block rdd_1_1 locally 460 19/07/10 11:40:32 INFO BlockManager: Found block rdd_3_1 locally 461 19/07/10 11:40:32 INFO Executor: Finished task 1.0 in stage 8.0 (TID 17). 1226 bytes result sent to driver 462 19/07/10 11:40:32 INFO Executor: Finished task 0.0 in stage 8.0 (TID 16). 1226 bytes result sent to driver 463 19/07/10 11:40:32 INFO TaskSetManager: Finished task 1.0 in stage 8.0 (TID 17) in 73 ms on localhost (executor driver) (1/2) 464 19/07/10 11:40:32 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID 16) in 82 ms on localhost (executor driver) (2/2) 465 19/07/10 11:40:32 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool 466 19/07/10 11:40:32 INFO DAGScheduler: ShuffleMapStage 8 (mapPartitions at KMeans.scala:302) finished in 0.135 s 467 19/07/10 11:40:32 INFO DAGScheduler: looking for newly runnable stages 468 19/07/10 11:40:32 INFO DAGScheduler: running: Set() 469 19/07/10 11:40:32 INFO DAGScheduler: waiting: Set(ResultStage 9) 470 19/07/10 11:40:32 INFO DAGScheduler: failed: Set() 471 19/07/10 11:40:32 INFO DAGScheduler: Submitting ResultStage 9 (ShuffledRDD[21] at reduceByKey at KMeans.scala:317), which has no missing parents 472 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 2.8 KB, free 344.7 MB) 473 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 1742.0 B, free 344.7 MB) 474 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 192.168.31.160:44595 (size: 1742.0 B, free: 345.0 MB) 475 19/07/10 11:40:32 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1161 476 19/07/10 11:40:32 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 9 (ShuffledRDD[21] at reduceByKey at KMeans.scala:317) (first 15 tasks are for partitions Vector(0, 1)) 477 19/07/10 11:40:32 INFO TaskSchedulerImpl: Adding task set 9.0 with 2 tasks 478 19/07/10 11:40:32 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 18, localhost, executor driver, partition 0, ANY, 7662 bytes) 479 19/07/10 11:40:32 INFO TaskSetManager: Starting task 1.0 in stage 9.0 (TID 19, localhost, executor driver, partition 1, ANY, 7662 bytes) 480 19/07/10 11:40:32 INFO Executor: Running task 0.0 in stage 9.0 (TID 18) 481 19/07/10 11:40:32 INFO Executor: Running task 1.0 in stage 9.0 (TID 19) 482 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks 483 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 484 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks 485 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 486 19/07/10 11:40:32 INFO Executor: Finished task 0.0 in stage 9.0 (TID 18). 1531 bytes result sent to driver 487 19/07/10 11:40:32 INFO Executor: Finished task 1.0 in stage 9.0 (TID 19). 1498 bytes result sent to driver 488 19/07/10 11:40:32 INFO TaskSetManager: Finished task 0.0 in stage 9.0 (TID 18) in 73 ms on localhost (executor driver) (1/2) 489 19/07/10 11:40:33 INFO TaskSetManager: Finished task 1.0 in stage 9.0 (TID 19) in 79 ms on localhost (executor driver) (2/2) 490 19/07/10 11:40:33 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool 491 19/07/10 11:40:33 INFO DAGScheduler: ResultStage 9 (collectAsMap at KMeans.scala:320) finished in 0.182 s 492 19/07/10 11:40:33 INFO DAGScheduler: Job 7 finished: collectAsMap at KMeans.scala:320, took 0.369581 s 493 19/07/10 11:40:33 INFO TorrentBroadcast: Destroying Broadcast(12) (from destroy at KMeans.scala:330) 494 19/07/10 11:40:33 INFO KMeans: Iterations took 0.564 seconds. 495 19/07/10 11:40:33 INFO KMeans: KMeans converged in 1 iterations. 496 19/07/10 11:40:33 INFO KMeans: The cost is 0.07500000000004324. 497 19/07/10 11:40:33 INFO BlockManagerInfo: Removed broadcast_12_piece0 on 192.168.31.160:44595 in memory (size: 334.0 B, free: 345.0 MB) 498 19/07/10 11:40:33 INFO MapPartitionsRDD: Removing RDD 3 from persistence list 499 19/07/10 11:40:33 INFO BlockManager: Removing RDD 3 500 19/07/10 11:40:33 WARN KMeans: The input data was not directly cached, which may hurt performance if its parent RDDs are also uncached. 501 [0.1,0.1,0.1] 502 [9.05,9.05,9.05] 503 [9.2,9.2,9.2] 504 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 296.0 B, free 344.7 MB) 505 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 334.0 B, free 344.7 MB) 506 19/07/10 11:40:33 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 192.168.31.160:44595 (size: 334.0 B, free: 345.0 MB) 507 19/07/10 11:40:33 INFO SparkContext: Created broadcast 15 from broadcast at KMeansModel.scala:102 508 19/07/10 11:40:33 INFO SparkContext: Starting job: sum at KMeansModel.scala:105 509 19/07/10 11:40:33 INFO DAGScheduler: Got job 8 (sum at KMeansModel.scala:105) with 2 output partitions 510 19/07/10 11:40:33 INFO DAGScheduler: Final stage: ResultStage 10 (sum at KMeansModel.scala:105) 511 19/07/10 11:40:33 INFO DAGScheduler: Parents of final stage: List() 512 19/07/10 11:40:33 INFO DAGScheduler: Missing parents: List() 513 19/07/10 11:40:33 INFO DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[22] at map at KMeansModel.scala:103), which has no missing parents 514 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 5.3 KB, free 344.7 MB) 515 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 3.0 KB, free 344.7 MB) 516 19/07/10 11:40:33 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory on 192.168.31.160:44595 (size: 3.0 KB, free: 345.0 MB) 517 19/07/10 11:40:33 INFO SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1161 518 19/07/10 11:40:33 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 10 (MapPartitionsRDD[22] at map at KMeansModel.scala:103) (first 15 tasks are for partitions Vector(0, 1)) 519 19/07/10 11:40:33 INFO TaskSchedulerImpl: Adding task set 10.0 with 2 tasks 520 19/07/10 11:40:33 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 20, localhost, executor driver, partition 0, PROCESS_LOCAL, 7889 bytes) 521 19/07/10 11:40:33 INFO TaskSetManager: Starting task 1.0 in stage 10.0 (TID 21, localhost, executor driver, partition 1, PROCESS_LOCAL, 7889 bytes) 522 19/07/10 11:40:33 INFO Executor: Running task 0.0 in stage 10.0 (TID 20) 523 19/07/10 11:40:33 INFO Executor: Running task 1.0 in stage 10.0 (TID 21) 524 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_1 locally 525 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_0 locally 526 19/07/10 11:40:33 INFO Executor: Finished task 0.0 in stage 10.0 (TID 20). 834 bytes result sent to driver 527 19/07/10 11:40:33 INFO Executor: Finished task 1.0 in stage 10.0 (TID 21). 834 bytes result sent to driver 528 19/07/10 11:40:33 INFO TaskSetManager: Finished task 0.0 in stage 10.0 (TID 20) in 30 ms on localhost (executor driver) (1/2) 529 19/07/10 11:40:33 INFO TaskSetManager: Finished task 1.0 in stage 10.0 (TID 21) in 31 ms on localhost (executor driver) (2/2) 530 19/07/10 11:40:33 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool 531 19/07/10 11:40:33 INFO DAGScheduler: ResultStage 10 (sum at KMeansModel.scala:105) finished in 0.066 s 532 19/07/10 11:40:33 INFO DAGScheduler: Job 8 finished: sum at KMeansModel.scala:105, took 0.074275 s 533 19/07/10 11:40:33 INFO TorrentBroadcast: Destroying Broadcast(15) (from destroy at KMeansModel.scala:106) 534 误差为:0.07500000000004324 535 19/07/10 11:40:33 INFO BlockManagerInfo: Removed broadcast_15_piece0 on 192.168.31.160:44595 in memory (size: 334.0 B, free: 345.0 MB) 536 19/07/10 11:40:33 INFO SparkContext: Starting job: foreach at Kmeans.scala:74 537 19/07/10 11:40:33 INFO DAGScheduler: Got job 9 (foreach at Kmeans.scala:74) with 2 output partitions 538 19/07/10 11:40:33 INFO DAGScheduler: Final stage: ResultStage 11 (foreach at Kmeans.scala:74) 539 19/07/10 11:40:33 INFO DAGScheduler: Parents of final stage: List() 540 19/07/10 11:40:33 INFO DAGScheduler: Missing parents: List() 541 19/07/10 11:40:33 INFO DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[23] at map at Kmeans.scala:68), which has no missing parents 542 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 4.6 KB, free 344.7 MB) 543 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 2.7 KB, free 344.7 MB) 544 19/07/10 11:40:33 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 192.168.31.160:44595 (size: 2.7 KB, free: 345.0 MB) 545 19/07/10 11:40:33 INFO SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1161 546 19/07/10 11:40:33 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 11 (MapPartitionsRDD[23] at map at Kmeans.scala:68) (first 15 tasks are for partitions Vector(0, 1)) 547 19/07/10 11:40:33 INFO TaskSchedulerImpl: Adding task set 11.0 with 2 tasks 548 19/07/10 11:40:33 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 22, localhost, executor driver, partition 0, PROCESS_LOCAL, 7889 bytes) 549 19/07/10 11:40:33 INFO TaskSetManager: Starting task 1.0 in stage 11.0 (TID 23, localhost, executor driver, partition 1, PROCESS_LOCAL, 7889 bytes) 550 19/07/10 11:40:33 INFO Executor: Running task 0.0 in stage 11.0 (TID 22) 551 19/07/10 11:40:33 INFO Executor: Running task 1.0 in stage 11.0 (TID 23) 552 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_1 locally 553 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_0 locally 554 [0.0,0.0,0.0]==>0 555 [0.1,0.1,0.1]==>0 556 [0.2,0.2,0.2]==>0 557 [9.0,9.0,9.0]==>1 558 19/07/10 11:40:33 INFO Executor: Finished task 0.0 in stage 11.0 (TID 22). 837 bytes result sent to driver 559 [9.1,9.1,9.1]==>1 560 [9.2,9.2,9.2]==>2 561 19/07/10 11:40:33 INFO Executor: Finished task 1.0 in stage 11.0 (TID 23). 794 bytes result sent to driver 562 19/07/10 11:40:33 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 22) in 35 ms on localhost (executor driver) (1/2) 563 19/07/10 11:40:33 INFO TaskSetManager: Finished task 1.0 in stage 11.0 (TID 23) in 37 ms on localhost (executor driver) (2/2) 564 19/07/10 11:40:33 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 565 19/07/10 11:40:33 INFO DAGScheduler: ResultStage 11 (foreach at Kmeans.scala:74) finished in 0.074 s 566 19/07/10 11:40:33 INFO DAGScheduler: Job 9 finished: foreach at Kmeans.scala:74, took 0.090780 s 567 19/07/10 11:40:33 INFO SparkContext: Invoking stop() from shutdown hook 568 19/07/10 11:40:33 INFO SparkUI: Stopped Spark web UI at http://192.168.31.160:4040 569 19/07/10 11:40:33 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 570 19/07/10 11:40:34 INFO MemoryStore: MemoryStore cleared 571 19/07/10 11:40:34 INFO BlockManager: BlockManager stopped 572 19/07/10 11:40:34 INFO BlockManagerMaster: BlockManagerMaster stopped 573 19/07/10 11:40:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 574 19/07/10 11:40:34 INFO SparkContext: Successfully stopped SparkContext 575 19/07/10 11:40:34 INFO ShutdownHookManager: Shutdown hook called 576 19/07/10 11:40:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-21be76b6-98d0-46ec-a1d6-640cb5556eff 577 578 Process finished with exit code 0
做一个总结,对于这种版本问题导致的程序不能正常运行的问题,如果长期不解决,或者没有提示,真的是让人很容易放弃这个产品的。