解惑:在Ubuntu18.04.2的idea上运行Scala支持的spark程序遇到的问题

解惑:在Ubuntu18.04.2的idea上运行Scala支持的spark程序遇到的问题

一、前言

    最近在做一点小的实验,用到了Scala,spark这些东西,于是在Linux平台上来完成,结果一个最简单的入门程序搞了一两天,出了汗颜之外,对于这些工具的难用性也有了深刻的认知,难怪Hadoop的几个公司会渐渐走向衰落。

二、解惑

    如果大家看过我之前的博客就知道,我是用过Hadoop,spark的,当时就遇到了非常多的麻烦,这些产品迭代的比较快,每个版本对于之前的兼容性可以说是微乎其微,因此版本的选用非常重要,除了在官网上看这些版本匹配的知识之外,网上很少涉及到这些东西的,但是这些东西却是非常重要的。而且这些产品安装起来也比较麻烦,下载下来,虽说是开箱即用,但是也需要对于里面的一些配置文件进行一些修改,这些都不算什么,当我们在命令行上运行的时候,却发现出现莫名其妙的错误,这些错误多与底层的Java版本,Hadoop版本,Scala版本等等有关,真的是让人很郁闷,但是产品做的也不好没有一些正确的提示,于是在网上找资料,但是发现能找到的非常少,往往是南辕北辙,自相矛盾,最后没有个一两天是很难找到最终的解决办法的。这些产品如果不改进,和那些MySQL,mongodb相比绝对是会被淘汰的。在本次小测试中,我就遇到了因为版本依赖问题而停工两天的问题,那就是在Ubuntu18.04.2的idea上运行Scala支持的spark程序,遇到的奇葩的问题。

    先介绍一下我是怎么一步步来构建程序的,网上有不少案例,但是都是浅尝辄止,语焉不详,这些人是不配写文章的,没有一点敬畏心和责任感,搞出来的东西是把很多最重要的细节直接忽略,不知道是缺乏表达能力还是不屑为之。首先就是创建什么样的工程,支持Scala的程序,在idea中可以有两种方法,一种是直接创建Scala工程,这样首先需要安装Scala插件,其次在创建工程之后需要自己配置程序运行的环境,这些环境盘根错节,配置起来可能需要很多次尝试,最终浪费大量的精力;第二种方式还是要安装Scala插件,但是创建maven工程,在pom.xml文件中导入需要的配置,根据依赖和继承关系自动下载,并且导入Scala插件即可,显然第二种更简单一点。于是我们用第二种方式,构建maven工程。

   创建新的文件夹,并且在程序结构中设置为我们的源文件文件夹。

     最后我们需要引入我们下载的Scala插件,这个时候就涉及到版本问题了,Scala2.10之前支持Java7,2.11之后不支持Java7,而是Java8了,我们用的Java8,那么至少也是2.11,而2.11有很多版本,我们需要去选择一个。在这个界面,我们加入相应的Scala版本,但是这个版本可能没有,于是我们点击download按钮即可选择相应的版本下载,这里不得不吐槽一下idea实在是做的比较差的一点,下载需要半个小时时间,并且下载过程中没有进度条,让人非常的不耐,关闭也非常的麻烦。下载之后我们选择相应的版本。

    到了这一点就需要在pom.xml中进行配置了,因为用到spark里面的机器学习插件,我们引入即可,因为包之间的依赖关系,maven自动帮我们搞定依赖关系,值得称赞。

 1   <properties>
 2     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 3     <maven.compiler.source>1.7</maven.compiler.source>
 4     <maven.compiler.target>1.7</maven.compiler.target>
 5 <!--    <scala.version>2.11.0</scala.version>-->
 6     <spark.artifactID.suffix>2.11</spark.artifactID.suffix>
 7     <spark.version>2.4.3</spark.version>
 8   </properties>
 9   <dependencies>
10      <dependency>
11        <groupId>junit</groupId>
12        <artifactId>junit</artifactId>
13        <version>4.11</version>
14        <scope>test</scope>
15      </dependency>
16      <dependency>
17         <groupId>org.apache.spark</groupId>
18         <artifactId>spark-mllib_${spark.artifactID.suffix}</artifactId>
19         <version>${spark.version}</version>
20      </dependency>
21   </dependencies>
View Code

    这里我们版本设置成Scala2.11,对应于刚刚的下载,如果用2.12不知道怎么的,明明导入了依赖关系,总是连程序都出现问题,说找不到相应的包,而我在下载的依赖中明明就发现了这些文件,真的是让人惊讶!!!后来好不容易找到了,运行的时候却发现对于出现奇葩的异常,运行个程序真的是难呀,我们的Scala的hello程序竟然都难到这种程度了,版本问题造成的错误可以说是很奇葩了,spark按照maven仓库里面来尝试,我选的是最新版2.4.3。因为这是我目前可以运行的配置,所以是暂时没问题的。有的时候更奇葩的是第一次运行成功了,第二次再运行另一个程序失败了,第三次再来运行第一次的程序也出现了一样的问题,把idea的缓存都清了一次重启了很多次,依然存在这些问题,在另一台电脑上操作还是这样的问题,你说让不让人绝望?!最终暂时探索的一个可行的版本关联配置是Java8+Scala2.11.0+sparkmlib2.11+spark2.4.3,至此问题解决。

  第一个程序:

 1 package com.kmeans
 2 
 3 import org.apache.spark.{SparkConf, SparkContext}
 4 
 5 
 6 object MyTest {
 7   def main(args:Array[String]): Unit = {
 8     val logFile="file:///home/zyr/file.txt"
 9     val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")
10     val sc=new SparkContext(conf)
11     val logData=sc.textFile(logFile,2).cache()
12     val num=logData.flatMap(x=>x.split(" ")).filter(_.contains("a")).count()
13     println("Words with a : %s".format(num))
14     sc.stop()
15   }
16 }
View Code

   文件:

xyr  a b c d f g a d f g
a a a a a a a a a
w e r t y yuu 
View Code

  运行结果:

 1 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -javaagent:/usr/local/idea/lib/idea_rt.jar=44451:/usr/local/idea/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/icedtea-sound.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/java-atk-wrapper.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar:/home/zyr/IdeaProjects/myspark/target/classes:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.0/scala-reflect-2.11.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.0/scala-library-2.11.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib_2.11/2.4.3/spark-mllib_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.11/1.1.0/scala-parser-combinators_2.11-1.1.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.12/scala-library-2.11.12.jar:/home/zyr/.m2/repository/org/apache/spark/spark-core_2.11/2.4.3/spark-core_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/home/zyr/.m2/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/zyr/.m2/repository/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/home/zyr/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/zyr/.m2/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/home/zyr/.m2/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/home/zyr/.m2/repository/commons-codec/commons-codec/1.9/commons-codec-1.9.jar:/home/zyr/.m2/repository/com/twitter/chill_2.11/0.9.3/chill_2.11-0.9.3.jar:/home/zyr/.m2/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/home/zyr/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/zyr/.m2/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/home/zyr/.m2/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/home/zyr/.m2/repository/org/apache/xbean/xbean-asm6-shaded/4.8/xbean-asm6-shaded-4.8.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-client/2.6.5/hadoop-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-common/2.6.5/hadoop-common-2.6.5.jar:/home/zyr/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/zyr/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/zyr/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/zyr/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/zyr/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/zyr/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/zyr/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/zyr/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/zyr/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar:/home/zyr/.m2/repository/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.6.5/hadoop-hdfs-2.6.5.jar:/home/zyr/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/zyr/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/zyr/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.5/hadoop-mapreduce-client-app-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.5/hadoop-mapreduce-client-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.6.5/hadoop-yarn-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.6.5/hadoop-yarn-server-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.5/hadoop-mapreduce-client-shuffle-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.6.5/hadoop-yarn-api-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.5/hadoop-mapreduce-client-core-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.6.5/hadoop-yarn-common-2.6.5.jar:/home/zyr/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/zyr/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.5/hadoop-mapreduce-client-jobclient-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-annotations/2.6.5/hadoop-annotations-2.6.5.jar:/home/zyr/.m2/repository/org/apache/spark/spark-launcher_2.11/2.4.3/spark-launcher_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-kvstore_2.11/2.4.3/spark-kvstore_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.6.7/jackson-core-2.6.7.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-common_2.11/2.4.3/spark-network-common_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-shuffle_2.11/2.4.3/spark-network-shuffle_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.4.3/spark-unsafe_2.11-2.4.3.jar:/home/zyr/.m2/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/home/zyr/.m2/repository/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/home/zyr/.m2/repository/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.jar:/home/zyr/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/zyr/.m2/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/home/zyr/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/zyr/.m2/repository/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar:/home/zyr/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/zyr/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/home/zyr/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/zyr/.m2/repository/org/xerial/snappy/snappy-java/1.1.7.3/snappy-java-1.1.7.3.jar:/home/zyr/.m2/repository/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0.jar:/home/zyr/.m2/repository/com/github/luben/zstd-jni/1.3.2-2/zstd-jni-1.3.2-2.jar:/home/zyr/.m2/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/home/zyr/.m2/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/home/zyr/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/zyr/.m2/repository/org/json4s/json4s-jackson_2.11/3.5.3/json4s-jackson_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-core_2.11/3.5.3/json4s-core_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-ast_2.11/3.5.3/json4s-ast_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-scalap_2.11/3.5.3/json4s-scalap_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-xml_2.11/1.0.6/scala-xml_2.11-1.0.6.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-client/2.22.2/jersey-client-2.22.2.jar:/home/zyr/.m2/repository/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/javax.inject/2.4.0-b34/javax.inject-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-locator/2.4.0-b34/hk2-locator-2.4.0-b34.jar:/home/zyr/.m2/repository/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar:/home/zyr/.m2/repository/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.1/osgi-resource-locator-1.0.1.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-server/2.22.2/jersey-server-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.22.2/jersey-media-jaxb-2.22.2.jar:/home/zyr/.m2/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2.jar:/home/zyr/.m2/repository/io/netty/netty-all/4.1.17.Final/netty-all-4.1.17.Final.jar:/home/zyr/.m2/repository/io/netty/netty/3.9.9.Final/netty-3.9.9.Final.jar:/home/zyr/.m2/repository/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-core/3.1.5/metrics-core-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-jvm/3.1.5/metrics-jvm-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-graphite/3.1.5/metrics-graphite-3.1.5.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.6.7.1/jackson-databind-2.6.7.1.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.11/2.6.7.1/jackson-module-scala_2.11-2.6.7.1.jar:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.8/scala-reflect-2.11.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.7.9/jackson-module-paranamer-2.7.9.jar:/home/zyr/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/zyr/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/zyr/.m2/repository/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar:/home/zyr/.m2/repository/net/sf/py4j/py4j/0.10.7/py4j-0.10.7.jar:/home/zyr/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-streaming_2.11/2.4.3/spark-streaming_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sql_2.11/2.4.3/spark-sql_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/univocity/univocity-parsers/2.7.3/univocity-parsers-2.7.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sketch_2.11/2.4.3/spark-sketch_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-catalyst_2.11/2.4.3/spark-catalyst_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/codehaus/janino/janino/3.0.9/janino-3.0.9.jar:/home/zyr/.m2/repository/org/codehaus/janino/commons-compiler/3.0.9/commons-compiler-3.0.9.jar:/home/zyr/.m2/repository/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar:/home/zyr/.m2/repository/org/apache/orc/orc-core/1.5.5/orc-core-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/orc/orc-shims/1.5.5/orc-shims-1.5.5.jar:/home/zyr/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/zyr/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/zyr/.m2/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/home/zyr/.m2/repository/org/apache/orc/orc-mapreduce/1.5.5/orc-mapreduce-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-vector/0.10.0/arrow-vector-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-format/0.10.0/arrow-format-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-memory/0.10.0/arrow-memory-0.10.0.jar:/home/zyr/.m2/repository/joda-time/joda-time/2.9.9/joda-time-2.9.9.jar:/home/zyr/.m2/repository/com/carrotsearch/hppc/0.7.2/hppc-0.7.2.jar:/home/zyr/.m2/repository/com/vlkan/flatbuffers/1.2.0-3f79e055/flatbuffers-1.2.0-3f79e055.jar:/home/zyr/.m2/repository/org/apache/spark/spark-graphx_2.11/2.4.3/spark-graphx_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/home/zyr/.m2/repository/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib-local_2.11/2.4.3/spark-mllib-local_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scalanlp/breeze_2.11/0.13.2/breeze_2.11-0.13.2.jar:/home/zyr/.m2/repository/org/scalanlp/breeze-macros_2.11/0.13.2/breeze-macros_2.11-0.13.2.jar:/home/zyr/.m2/repository/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/home/zyr/.m2/repository/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/home/zyr/.m2/repository/org/spire-math/spire_2.11/0.13.0/spire_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/spire-math/spire-macros_2.11/0.13.0/spire-macros_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/typelevel/machinist_2.11/0.6.1/machinist_2.11-0.6.1.jar:/home/zyr/.m2/repository/com/chuusai/shapeless_2.11/2.3.2/shapeless_2.11-2.3.2.jar:/home/zyr/.m2/repository/org/typelevel/macro-compat_2.11/1.1.1/macro-compat_2.11-1.1.1.jar:/home/zyr/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-tags_2.11/2.4.3/spark-tags_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar com.kmeans.MyTest
 2 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 3 19/07/10 11:36:47 WARN Utils: Your hostname, zyrpc resolves to a loopback address: 127.0.1.1; using 192.168.31.160 instead (on interface ens33)
 4 19/07/10 11:36:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
 5 19/07/10 11:36:47 INFO SparkContext: Running Spark version 2.4.3
 6 19/07/10 11:36:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 7 19/07/10 11:36:50 INFO SparkContext: Submitted application: Simple Application
 8 19/07/10 11:36:50 INFO SecurityManager: Changing view acls to: zyr
 9 19/07/10 11:36:50 INFO SecurityManager: Changing modify acls to: zyr
10 19/07/10 11:36:50 INFO SecurityManager: Changing view acls groups to: 
11 19/07/10 11:36:50 INFO SecurityManager: Changing modify acls groups to: 
12 19/07/10 11:36:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zyr); groups with view permissions: Set(); users  with modify permissions: Set(zyr); groups with modify permissions: Set()
13 19/07/10 11:36:52 INFO Utils: Successfully started service 'sparkDriver' on port 41147.
14 19/07/10 11:36:52 INFO SparkEnv: Registering MapOutputTracker
15 19/07/10 11:36:52 INFO SparkEnv: Registering BlockManagerMaster
16 19/07/10 11:36:52 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17 19/07/10 11:36:52 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18 19/07/10 11:36:52 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-63b48034-1ffc-40fa-bb45-6c117cb0451b
19 19/07/10 11:36:52 INFO MemoryStore: MemoryStore started with capacity 345.0 MB
20 19/07/10 11:36:52 INFO SparkEnv: Registering OutputCommitCoordinator
21 19/07/10 11:36:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22 19/07/10 11:36:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.31.160:4040
23 19/07/10 11:36:54 INFO Executor: Starting executor ID driver on host localhost
24 19/07/10 11:36:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34263.
25 19/07/10 11:36:54 INFO NettyBlockTransferService: Server created on 192.168.31.160:34263
26 19/07/10 11:36:54 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
27 19/07/10 11:36:54 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.31.160, 34263, None)
28 19/07/10 11:36:54 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.31.160:34263 with 345.0 MB RAM, BlockManagerId(driver, 192.168.31.160, 34263, None)
29 19/07/10 11:36:54 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.31.160, 34263, None)
30 19/07/10 11:36:54 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.31.160, 34263, None)
31 19/07/10 11:36:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 344.8 MB)
32 19/07/10 11:36:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 344.8 MB)
33 19/07/10 11:36:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.31.160:34263 (size: 20.4 KB, free: 345.0 MB)
34 19/07/10 11:36:57 INFO SparkContext: Created broadcast 0 from textFile at MyTest.scala:11
35 19/07/10 11:36:58 INFO FileInputFormat: Total input paths to process : 1
36 19/07/10 11:36:58 INFO SparkContext: Starting job: count at MyTest.scala:12
37 19/07/10 11:36:58 INFO DAGScheduler: Got job 0 (count at MyTest.scala:12) with 2 output partitions
38 19/07/10 11:36:58 INFO DAGScheduler: Final stage: ResultStage 0 (count at MyTest.scala:12)
39 19/07/10 11:36:58 INFO DAGScheduler: Parents of final stage: List()
40 19/07/10 11:36:58 INFO DAGScheduler: Missing parents: List()
41 19/07/10 11:36:58 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at filter at MyTest.scala:12), which has no missing parents
42 19/07/10 11:36:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.7 KB, free 344.8 MB)
43 19/07/10 11:36:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.1 KB, free 344.8 MB)
44 19/07/10 11:36:58 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.31.160:34263 (size: 2.1 KB, free: 345.0 MB)
45 19/07/10 11:36:58 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
46 19/07/10 11:36:58 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at filter at MyTest.scala:12) (first 15 tasks are for partitions Vector(0, 1))
47 19/07/10 11:36:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
48 19/07/10 11:36:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7883 bytes)
49 19/07/10 11:36:58 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7883 bytes)
50 19/07/10 11:36:58 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
51 19/07/10 11:36:58 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
52 19/07/10 11:36:59 INFO HadoopRDD: Input split: file:/home/zyr/file.txt:0+29
53 19/07/10 11:36:59 INFO HadoopRDD: Input split: file:/home/zyr/file.txt:29+29
54 19/07/10 11:36:59 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 192.0 B, free 344.8 MB)
55 19/07/10 11:36:59 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 96.0 B, free 344.8 MB)
56 19/07/10 11:36:59 INFO BlockManagerInfo: Added rdd_1_1 in memory on 192.168.31.160:34263 (size: 96.0 B, free: 345.0 MB)
57 19/07/10 11:36:59 INFO BlockManagerInfo: Added rdd_1_0 in memory on 192.168.31.160:34263 (size: 192.0 B, free: 345.0 MB)
58 19/07/10 11:36:59 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 875 bytes result sent to driver
59 19/07/10 11:36:59 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 875 bytes result sent to driver
60 19/07/10 11:36:59 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 635 ms on localhost (executor driver) (1/2)
61 19/07/10 11:36:59 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 593 ms on localhost (executor driver) (2/2)
62 19/07/10 11:36:59 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
63 19/07/10 11:36:59 INFO DAGScheduler: ResultStage 0 (count at MyTest.scala:12) finished in 1.203 s
64 19/07/10 11:36:59 INFO DAGScheduler: Job 0 finished: count at MyTest.scala:12, took 1.438576 s
65 Words with a : 11
66 19/07/10 11:36:59 INFO SparkUI: Stopped Spark web UI at http://192.168.31.160:4040
67 19/07/10 11:36:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
68 19/07/10 11:36:59 INFO MemoryStore: MemoryStore cleared
69 19/07/10 11:36:59 INFO BlockManager: BlockManager stopped
70 19/07/10 11:36:59 INFO BlockManagerMaster: BlockManagerMaster stopped
71 19/07/10 11:36:59 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
72 19/07/10 11:36:59 INFO SparkContext: Successfully stopped SparkContext
73 19/07/10 11:36:59 INFO ShutdownHookManager: Shutdown hook called
74 19/07/10 11:36:59 INFO ShutdownHookManager: Deleting directory /tmp/spark-e07b1de0-0ac0-4abe-952a-504c2c7282fd
75 
76 Process finished with exit code 0
View Code

  第二个程序:

 1 package com.kmeans
 2 
 3 import org.apache.spark.mllib.clustering.KMeans
 4 import org.apache.spark.mllib.linalg.Vectors
 5 import org.apache.spark.{SparkConf, SparkContext}
 6 
 7 
 8 /**
 9 Scala版K近邻算法获取三维空间点中数据的归属
10   * ****************
11   * 测试数据(x,y,z) *
12   * ***************
13   * 0.0 0.0 0.0
14   * 0.1 0.1 0.1
15   * 0.2 0.2 0.2
16   * 9.0 9.0 9.0
17   * 9.1 9.1 9.1
18   * 9.2 9.2 9.2
19   */
20 object Kmeans {
21   def main(args: Array[String]): Unit = {
22 
23     val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")
24     val context=new SparkContext(conf)
25     val dataSourceRDD = context.textFile("file:///home/zyr/kmeanstest.txt").cache()
26     val trainRDD = dataSourceRDD.map(lines => Vectors.dense(lines.split(" ").map(_.toDouble)))
27     // trainRDD.foreach(trainRow => println(trainRow)
28     // trainRDD.foreach(println)
29     // 训练数据得到模型
30     // 参数一:训练数据(Vectors类型的RDD)
31     // 参数二:中心簇数量 0 ~ n
32     // 参数三:代次数
33     val model = KMeans.train(trainRDD, 3, 30)
34 
35     // 获取数据模型的中心点
36     val clustercenters = model.clusterCenters
37 
38     // 打印数据模型的中心点
39     clustercenters.foreach(println)
40 
41     //计算误差
42     val cross = model.computeCost(trainRDD)
43     println("误差为:" + cross)
44 
45     // 使用模型匹配测试数据获取预测结果
46     val res1 = model.predict(Vectors.dense("0.2 0.2 0.2".split(' ').map(_.toDouble)))
47     val res2 = model.predict(Vectors.dense("0.25 0.25 0.25".split(' ').map(_.toDouble)))
48     val res3 = model.predict(Vectors.dense("0.1 0.1 0.1".split(' ').map(_.toDouble)))
49     val res4 = model.predict(Vectors.dense("9 9 9".split(' ').map(_.toDouble)))
50     val res5 = model.predict(Vectors.dense("9.1 9.1 9.1".split(' ').map(_.toDouble)))
51     val res6 = model.predict(Vectors.dense("9.06 9.06 9.06".split(' ').map(_.toDouble)))
52     // println("预测结果为:\r\n" + res1 + "\r\n" + res2 + "\r\n"  + res3 + "\r\n"  + res4 + "\r\n"  + res5 + "\r\n"  + res6)
53     /**
54       * 这是三个中心点
55       * [9.1,9.1,9.1]
56       * [0.05,0.05,0.05]
57       * [0.2,0.2,0.2]
58       * 以下为类簇值
59       * 2
60       * 2
61       * 1
62       * 0
63       * 0
64       * 0
65       * 此处结果可以看出输入数据与中心点更靠近的话就属于哪一个簇
66       */
67     // 使用原数据进行交叉评估预测
68     val crossPredictRes = dataSourceRDD.map{
69       lines =>
70         val lineVectors = Vectors.dense(lines.split(" ").map(_.toDouble))
71         val predictRes = model.predict(lineVectors)
72         lineVectors + "==>" + predictRes
73     }
74     crossPredictRes.foreach(println)
75 
76     /**
77       * [9.0,9.0,9.0]==>0
78       * [9.1,9.1,9.1]==>0
79       * [9.2,9.2,9.2]==>0
80       * [0.0,0.0,0.0]==>1
81       * [0.1,0.1,0.1]==>1
82       * [0.2,0.2,0.2]==>2
83       *
84       */
85   }
86 }
View Code

   文件:

1 0.0 0.0 0.0
2 0.1 0.1 0.1
3 0.2 0.2 0.2
4 9.0 9.0 9.0
5 9.1 9.1 9.1
6 9.2 9.2 9.2
View Code

   运行结果:

  1 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -javaagent:/usr/local/idea/lib/idea_rt.jar=34781:/usr/local/idea/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/icedtea-sound.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/java-atk-wrapper.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar:/home/zyr/IdeaProjects/myspark/target/classes:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.0/scala-reflect-2.11.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.0/scala-library-2.11.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib_2.11/2.4.3/spark-mllib_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.11/1.1.0/scala-parser-combinators_2.11-1.1.0.jar:/home/zyr/.m2/repository/org/scala-lang/scala-library/2.11.12/scala-library-2.11.12.jar:/home/zyr/.m2/repository/org/apache/spark/spark-core_2.11/2.4.3/spark-core_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/home/zyr/.m2/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/home/zyr/.m2/repository/org/apache/commons/commons-compress/1.8.1/commons-compress-1.8.1.jar:/home/zyr/.m2/repository/org/tukaani/xz/1.5/xz-1.5.jar:/home/zyr/.m2/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/home/zyr/.m2/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/home/zyr/.m2/repository/commons-codec/commons-codec/1.9/commons-codec-1.9.jar:/home/zyr/.m2/repository/com/twitter/chill_2.11/0.9.3/chill_2.11-0.9.3.jar:/home/zyr/.m2/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/home/zyr/.m2/repository/com/esotericsoftware/minlog/1.3.0/minlog-1.3.0.jar:/home/zyr/.m2/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/home/zyr/.m2/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/home/zyr/.m2/repository/org/apache/xbean/xbean-asm6-shaded/4.8/xbean-asm6-shaded-4.8.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-client/2.6.5/hadoop-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-common/2.6.5/hadoop-common-2.6.5.jar:/home/zyr/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/zyr/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/zyr/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/home/zyr/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/home/zyr/.m2/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/home/zyr/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/zyr/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/zyr/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/zyr/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar:/home/zyr/.m2/repository/org/apache/httpcomponents/httpcore/4.2.4/httpcore-4.2.4.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/home/zyr/.m2/repository/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar:/home/zyr/.m2/repository/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.6.5/hadoop-hdfs-2.6.5.jar:/home/zyr/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/home/zyr/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/home/zyr/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.6.5/hadoop-mapreduce-client-app-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.6.5/hadoop-mapreduce-client-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.6.5/hadoop-yarn-client-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.6.5/hadoop-yarn-server-common-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.6.5/hadoop-mapreduce-client-shuffle-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.6.5/hadoop-yarn-api-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.6.5/hadoop-mapreduce-client-core-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.6.5/hadoop-yarn-common-2.6.5.jar:/home/zyr/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/home/zyr/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar:/home/zyr/.m2/repository/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.6.5/hadoop-mapreduce-client-jobclient-2.6.5.jar:/home/zyr/.m2/repository/org/apache/hadoop/hadoop-annotations/2.6.5/hadoop-annotations-2.6.5.jar:/home/zyr/.m2/repository/org/apache/spark/spark-launcher_2.11/2.4.3/spark-launcher_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-kvstore_2.11/2.4.3/spark-kvstore_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.6.7/jackson-core-2.6.7.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-common_2.11/2.4.3/spark-network-common_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-network-shuffle_2.11/2.4.3/spark-network-shuffle_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.4.3/spark-unsafe_2.11-2.4.3.jar:/home/zyr/.m2/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/home/zyr/.m2/repository/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/home/zyr/.m2/repository/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.jar:/home/zyr/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/zyr/.m2/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/home/zyr/.m2/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/home/zyr/.m2/repository/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar:/home/zyr/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/home/zyr/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/home/zyr/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/zyr/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/home/zyr/.m2/repository/com/ning/compress-lzf/1.0.3/compress-lzf-1.0.3.jar:/home/zyr/.m2/repository/org/xerial/snappy/snappy-java/1.1.7.3/snappy-java-1.1.7.3.jar:/home/zyr/.m2/repository/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0.jar:/home/zyr/.m2/repository/com/github/luben/zstd-jni/1.3.2-2/zstd-jni-1.3.2-2.jar:/home/zyr/.m2/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/home/zyr/.m2/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/home/zyr/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/zyr/.m2/repository/org/json4s/json4s-jackson_2.11/3.5.3/json4s-jackson_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-core_2.11/3.5.3/json4s-core_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-ast_2.11/3.5.3/json4s-ast_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/json4s/json4s-scalap_2.11/3.5.3/json4s-scalap_2.11-3.5.3.jar:/home/zyr/.m2/repository/org/scala-lang/modules/scala-xml_2.11/1.0.6/scala-xml_2.11-1.0.6.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-client/2.22.2/jersey-client-2.22.2.jar:/home/zyr/.m2/repository/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/external/javax.inject/2.4.0-b34/javax.inject-2.4.0-b34.jar:/home/zyr/.m2/repository/org/glassfish/hk2/hk2-locator/2.4.0-b34/hk2-locator-2.4.0-b34.jar:/home/zyr/.m2/repository/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar:/home/zyr/.m2/repository/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/hk2/osgi-resource-locator/1.0.1/osgi-resource-locator-1.0.1.jar:/home/zyr/.m2/repository/org/glassfish/jersey/core/jersey-server/2.22.2/jersey-server-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.22.2/jersey-media-jaxb-2.22.2.jar:/home/zyr/.m2/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2.jar:/home/zyr/.m2/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2.jar:/home/zyr/.m2/repository/io/netty/netty-all/4.1.17.Final/netty-all-4.1.17.Final.jar:/home/zyr/.m2/repository/io/netty/netty/3.9.9.Final/netty-3.9.9.Final.jar:/home/zyr/.m2/repository/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-core/3.1.5/metrics-core-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-jvm/3.1.5/metrics-jvm-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5.jar:/home/zyr/.m2/repository/io/dropwizard/metrics/metrics-graphite/3.1.5/metrics-graphite-3.1.5.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.6.7.1/jackson-databind-2.6.7.1.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-scala_2.11/2.6.7.1/jackson-module-scala_2.11-2.6.7.1.jar:/home/zyr/.m2/repository/org/scala-lang/scala-reflect/2.11.8/scala-reflect-2.11.8.jar:/home/zyr/.m2/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.7.9/jackson-module-paranamer-2.7.9.jar:/home/zyr/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/home/zyr/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/home/zyr/.m2/repository/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar:/home/zyr/.m2/repository/net/sf/py4j/py4j/0.10.7/py4j-0.10.7.jar:/home/zyr/.m2/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/home/zyr/.m2/repository/org/apache/spark/spark-streaming_2.11/2.4.3/spark-streaming_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sql_2.11/2.4.3/spark-sql_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/univocity/univocity-parsers/2.7.3/univocity-parsers-2.7.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-sketch_2.11/2.4.3/spark-sketch_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/apache/spark/spark-catalyst_2.11/2.4.3/spark-catalyst_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/codehaus/janino/janino/3.0.9/janino-3.0.9.jar:/home/zyr/.m2/repository/org/codehaus/janino/commons-compiler/3.0.9/commons-compiler-3.0.9.jar:/home/zyr/.m2/repository/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar:/home/zyr/.m2/repository/org/apache/orc/orc-core/1.5.5/orc-core-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/orc/orc-shims/1.5.5/orc-shims-1.5.5.jar:/home/zyr/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/home/zyr/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/zyr/.m2/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/home/zyr/.m2/repository/org/apache/orc/orc-mapreduce/1.5.5/orc-mapreduce-1.5.5-nohive.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/home/zyr/.m2/repository/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-vector/0.10.0/arrow-vector-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-format/0.10.0/arrow-format-0.10.0.jar:/home/zyr/.m2/repository/org/apache/arrow/arrow-memory/0.10.0/arrow-memory-0.10.0.jar:/home/zyr/.m2/repository/joda-time/joda-time/2.9.9/joda-time-2.9.9.jar:/home/zyr/.m2/repository/com/carrotsearch/hppc/0.7.2/hppc-0.7.2.jar:/home/zyr/.m2/repository/com/vlkan/flatbuffers/1.2.0-3f79e055/flatbuffers-1.2.0-3f79e055.jar:/home/zyr/.m2/repository/org/apache/spark/spark-graphx_2.11/2.4.3/spark-graphx_2.11-2.4.3.jar:/home/zyr/.m2/repository/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/home/zyr/.m2/repository/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-mllib-local_2.11/2.4.3/spark-mllib-local_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/scalanlp/breeze_2.11/0.13.2/breeze_2.11-0.13.2.jar:/home/zyr/.m2/repository/org/scalanlp/breeze-macros_2.11/0.13.2/breeze-macros_2.11-0.13.2.jar:/home/zyr/.m2/repository/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/home/zyr/.m2/repository/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/home/zyr/.m2/repository/org/spire-math/spire_2.11/0.13.0/spire_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/spire-math/spire-macros_2.11/0.13.0/spire-macros_2.11-0.13.0.jar:/home/zyr/.m2/repository/org/typelevel/machinist_2.11/0.6.1/machinist_2.11-0.6.1.jar:/home/zyr/.m2/repository/com/chuusai/shapeless_2.11/2.3.2/shapeless_2.11-2.3.2.jar:/home/zyr/.m2/repository/org/typelevel/macro-compat_2.11/1.1.1/macro-compat_2.11-1.1.1.jar:/home/zyr/.m2/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/home/zyr/.m2/repository/org/apache/spark/spark-tags_2.11/2.4.3/spark-tags_2.11-2.4.3.jar:/home/zyr/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar com.kmeans.Kmeans
  2 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  3 19/07/10 11:40:11 WARN Utils: Your hostname, zyrpc resolves to a loopback address: 127.0.1.1; using 192.168.31.160 instead (on interface ens33)
  4 19/07/10 11:40:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
  5 19/07/10 11:40:11 INFO SparkContext: Running Spark version 2.4.3
  6 19/07/10 11:40:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  7 19/07/10 11:40:14 INFO SparkContext: Submitted application: Simple Application
  8 19/07/10 11:40:15 INFO SecurityManager: Changing view acls to: zyr
  9 19/07/10 11:40:15 INFO SecurityManager: Changing modify acls to: zyr
 10 19/07/10 11:40:15 INFO SecurityManager: Changing view acls groups to: 
 11 19/07/10 11:40:15 INFO SecurityManager: Changing modify acls groups to: 
 12 19/07/10 11:40:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zyr); groups with view permissions: Set(); users  with modify permissions: Set(zyr); groups with modify permissions: Set()
 13 19/07/10 11:40:17 INFO Utils: Successfully started service 'sparkDriver' on port 45437.
 14 19/07/10 11:40:17 INFO SparkEnv: Registering MapOutputTracker
 15 19/07/10 11:40:18 INFO SparkEnv: Registering BlockManagerMaster
 16 19/07/10 11:40:18 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
 17 19/07/10 11:40:18 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
 18 19/07/10 11:40:18 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-2d502b3d-b275-49f2-9660-e1310680f61d
 19 19/07/10 11:40:18 INFO MemoryStore: MemoryStore started with capacity 345.0 MB
 20 19/07/10 11:40:18 INFO SparkEnv: Registering OutputCommitCoordinator
 21 19/07/10 11:40:20 INFO Utils: Successfully started service 'SparkUI' on port 4040.
 22 19/07/10 11:40:20 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.31.160:4040
 23 19/07/10 11:40:21 INFO Executor: Starting executor ID driver on host localhost
 24 19/07/10 11:40:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44595.
 25 19/07/10 11:40:22 INFO NettyBlockTransferService: Server created on 192.168.31.160:44595
 26 19/07/10 11:40:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
 27 19/07/10 11:40:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.31.160, 44595, None)
 28 19/07/10 11:40:23 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.31.160:44595 with 345.0 MB RAM, BlockManagerId(driver, 192.168.31.160, 44595, None)
 29 19/07/10 11:40:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.31.160, 44595, None)
 30 19/07/10 11:40:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.31.160, 44595, None)
 31 19/07/10 11:40:25 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 344.8 MB)
 32 19/07/10 11:40:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 344.8 MB)
 33 19/07/10 11:40:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.31.160:44595 (size: 20.4 KB, free: 345.0 MB)
 34 19/07/10 11:40:26 INFO SparkContext: Created broadcast 0 from textFile at Kmeans.scala:25
 35 19/07/10 11:40:26 WARN KMeans: The input data is not directly cached, which may hurt performance if its parent RDDs are also uncached.
 36 19/07/10 11:40:26 INFO FileInputFormat: Total input paths to process : 1
 37 19/07/10 11:40:26 INFO SparkContext: Starting job: takeSample at KMeans.scala:386
 38 19/07/10 11:40:26 INFO DAGScheduler: Got job 0 (takeSample at KMeans.scala:386) with 2 output partitions
 39 19/07/10 11:40:26 INFO DAGScheduler: Final stage: ResultStage 0 (takeSample at KMeans.scala:386)
 40 19/07/10 11:40:26 INFO DAGScheduler: Parents of final stage: List()
 41 19/07/10 11:40:26 INFO DAGScheduler: Missing parents: List()
 42 19/07/10 11:40:26 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[5] at map at KMeans.scala:248), which has no missing parents
 43 19/07/10 11:40:27 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.3 KB, free 344.8 MB)
 44 19/07/10 11:40:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 344.8 MB)
 45 19/07/10 11:40:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.31.160:44595 (size: 2.5 KB, free: 345.0 MB)
 46 19/07/10 11:40:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
 47 19/07/10 11:40:27 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[5] at map at KMeans.scala:248) (first 15 tasks are for partitions Vector(0, 1))
 48 19/07/10 11:40:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 49 19/07/10 11:40:27 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 8200 bytes)
 50 19/07/10 11:40:27 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 8200 bytes)
 51 19/07/10 11:40:27 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
 52 19/07/10 11:40:27 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
 53 19/07/10 11:40:28 INFO HadoopRDD: Input split: file:/home/zyr/kmeanstest.txt:36+36
 54 19/07/10 11:40:28 INFO HadoopRDD: Input split: file:/home/zyr/kmeanstest.txt:0+36
 55 19/07/10 11:40:28 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 288.0 B, free 344.8 MB)
 56 19/07/10 11:40:28 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 152.0 B, free 344.8 MB)
 57 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_1_0 in memory on 192.168.31.160:44595 (size: 288.0 B, free: 345.0 MB)
 58 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_1_1 in memory on 192.168.31.160:44595 (size: 152.0 B, free: 345.0 MB)
 59 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_0 locally
 60 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_1 locally
 61 19/07/10 11:40:28 INFO MemoryStore: Block rdd_3_0 stored as values in memory (estimated size 48.0 B, free 344.8 MB)
 62 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_3_0 in memory on 192.168.31.160:44595 (size: 48.0 B, free: 345.0 MB)
 63 19/07/10 11:40:28 INFO MemoryStore: Block rdd_3_1 stored as values in memory (estimated size 32.0 B, free 344.8 MB)
 64 19/07/10 11:40:28 INFO BlockManagerInfo: Added rdd_3_1 in memory on 192.168.31.160:44595 (size: 32.0 B, free: 345.0 MB)
 65 19/07/10 11:40:28 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 875 bytes result sent to driver
 66 19/07/10 11:40:28 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 875 bytes result sent to driver
 67 19/07/10 11:40:28 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 837 ms on localhost (executor driver) (1/2)
 68 19/07/10 11:40:28 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 942 ms on localhost (executor driver) (2/2)
 69 19/07/10 11:40:28 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
 70 19/07/10 11:40:28 INFO DAGScheduler: ResultStage 0 (takeSample at KMeans.scala:386) finished in 1.635 s
 71 19/07/10 11:40:28 INFO DAGScheduler: Job 0 finished: takeSample at KMeans.scala:386, took 1.961204 s
 72 19/07/10 11:40:28 INFO SparkContext: Starting job: takeSample at KMeans.scala:386
 73 19/07/10 11:40:28 INFO DAGScheduler: Got job 1 (takeSample at KMeans.scala:386) with 2 output partitions
 74 19/07/10 11:40:28 INFO DAGScheduler: Final stage: ResultStage 1 (takeSample at KMeans.scala:386)
 75 19/07/10 11:40:28 INFO DAGScheduler: Parents of final stage: List()
 76 19/07/10 11:40:28 INFO DAGScheduler: Missing parents: List()
 77 19/07/10 11:40:28 INFO DAGScheduler: Submitting ResultStage 1 (PartitionwiseSampledRDD[7] at takeSample at KMeans.scala:386), which has no missing parents
 78 19/07/10 11:40:28 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 5.1 KB, free 344.8 MB)
 79 19/07/10 11:40:28 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.9 KB, free 344.8 MB)
 80 19/07/10 11:40:28 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.31.160:44595 (size: 2.9 KB, free: 345.0 MB)
 81 19/07/10 11:40:28 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1161
 82 19/07/10 11:40:28 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (PartitionwiseSampledRDD[7] at takeSample at KMeans.scala:386) (first 15 tasks are for partitions Vector(0, 1))
 83 19/07/10 11:40:28 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
 84 19/07/10 11:40:28 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 8309 bytes)
 85 19/07/10 11:40:28 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, PROCESS_LOCAL, 8309 bytes)
 86 19/07/10 11:40:28 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
 87 19/07/10 11:40:28 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
 88 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_0 locally
 89 19/07/10 11:40:28 INFO BlockManager: Found block rdd_3_0 locally
 90 19/07/10 11:40:28 INFO BlockManager: Found block rdd_1_1 locally
 91 19/07/10 11:40:28 INFO BlockManager: Found block rdd_3_1 locally
 92 19/07/10 11:40:28 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1283 bytes result sent to driver
 93 19/07/10 11:40:28 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1175 bytes result sent to driver
 94 19/07/10 11:40:28 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 92 ms on localhost (executor driver) (1/2)
 95 19/07/10 11:40:28 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 96 ms on localhost (executor driver) (2/2)
 96 19/07/10 11:40:28 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
 97 19/07/10 11:40:28 INFO DAGScheduler: ResultStage 1 (takeSample at KMeans.scala:386) finished in 0.132 s
 98 19/07/10 11:40:28 INFO DAGScheduler: Job 1 finished: takeSample at KMeans.scala:386, took 0.153980 s
 99 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 144.0 B, free 344.8 MB)
100 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 344.0 B, free 344.8 MB)
101 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.31.160:44595 (size: 344.0 B, free: 345.0 MB)
102 19/07/10 11:40:29 INFO SparkContext: Created broadcast 3 from broadcast at KMeans.scala:400
103 19/07/10 11:40:29 INFO SparkContext: Starting job: sum at KMeans.scala:406
104 19/07/10 11:40:29 INFO DAGScheduler: Got job 2 (sum at KMeans.scala:406) with 2 output partitions
105 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 2 (sum at KMeans.scala:406)
106 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List()
107 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List()
108 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at map at KMeans.scala:403), which has no missing parents
109 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 5.4 KB, free 344.7 MB)
110 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.0 KB, free 344.7 MB)
111 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.31.160:44595 (size: 3.0 KB, free: 345.0 MB)
112 19/07/10 11:40:29 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1161
113 19/07/10 11:40:29 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at map at KMeans.scala:403) (first 15 tasks are for partitions Vector(0, 1))
114 19/07/10 11:40:29 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
115 19/07/10 11:40:29 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, localhost, executor driver, partition 0, PROCESS_LOCAL, 8232 bytes)
116 19/07/10 11:40:29 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 8232 bytes)
117 19/07/10 11:40:29 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
118 19/07/10 11:40:29 INFO Executor: Running task 1.0 in stage 2.0 (TID 5)
119 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally
120 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally
121 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally
122 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally
123 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally
124 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally
125 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally
126 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally
127 19/07/10 11:40:29 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
128 19/07/10 11:40:29 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
129 19/07/10 11:40:29 INFO MemoryStore: Block rdd_9_1 stored as values in memory (estimated size 32.0 B, free 344.7 MB)
130 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_9_1 in memory on 192.168.31.160:44595 (size: 32.0 B, free: 345.0 MB)
131 19/07/10 11:40:29 INFO Executor: Finished task 1.0 in stage 2.0 (TID 5). 834 bytes result sent to driver
132 19/07/10 11:40:29 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 149 ms on localhost (executor driver) (1/2)
133 19/07/10 11:40:29 INFO MemoryStore: Block rdd_9_0 stored as values in memory (estimated size 48.0 B, free 344.7 MB)
134 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_9_0 in memory on 192.168.31.160:44595 (size: 48.0 B, free: 345.0 MB)
135 19/07/10 11:40:29 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4). 834 bytes result sent to driver
136 19/07/10 11:40:29 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 168 ms on localhost (executor driver) (2/2)
137 19/07/10 11:40:29 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
138 19/07/10 11:40:29 INFO DAGScheduler: ResultStage 2 (sum at KMeans.scala:406) finished in 0.199 s
139 19/07/10 11:40:29 INFO DAGScheduler: Job 2 finished: sum at KMeans.scala:406, took 0.221468 s
140 19/07/10 11:40:29 INFO MapPartitionsRDD: Removing RDD 6 from persistence list
141 19/07/10 11:40:29 INFO BlockManager: Removing RDD 6
142 19/07/10 11:40:29 INFO SparkContext: Starting job: collect at KMeans.scala:414
143 19/07/10 11:40:29 INFO DAGScheduler: Got job 3 (collect at KMeans.scala:414) with 2 output partitions
144 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 3 (collect at KMeans.scala:414)
145 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List()
146 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List()
147 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[11] at mapPartitionsWithIndex at KMeans.scala:411), which has no missing parents
148 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 6.1 KB, free 344.7 MB)
149 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.3 KB, free 344.7 MB)
150 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.31.160:44595 (size: 3.3 KB, free: 345.0 MB)
151 19/07/10 11:40:29 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1161
152 19/07/10 11:40:29 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (MapPartitionsRDD[11] at mapPartitionsWithIndex at KMeans.scala:411) (first 15 tasks are for partitions Vector(0, 1))
153 19/07/10 11:40:29 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
154 19/07/10 11:40:29 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, localhost, executor driver, partition 0, PROCESS_LOCAL, 8264 bytes)
155 19/07/10 11:40:29 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, localhost, executor driver, partition 1, PROCESS_LOCAL, 8264 bytes)
156 19/07/10 11:40:29 INFO Executor: Running task 0.0 in stage 3.0 (TID 6)
157 19/07/10 11:40:29 INFO Executor: Running task 1.0 in stage 3.0 (TID 7)
158 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally
159 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally
160 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_0 locally
161 19/07/10 11:40:29 INFO Executor: Finished task 0.0 in stage 3.0 (TID 6). 1078 bytes result sent to driver
162 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally
163 19/07/10 11:40:29 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 21 ms on localhost (executor driver) (1/2)
164 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally
165 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_1 locally
166 19/07/10 11:40:29 INFO Executor: Finished task 1.0 in stage 3.0 (TID 7). 1132 bytes result sent to driver
167 19/07/10 11:40:29 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 31 ms on localhost (executor driver) (2/2)
168 19/07/10 11:40:29 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
169 19/07/10 11:40:29 INFO DAGScheduler: ResultStage 3 (collect at KMeans.scala:414) finished in 0.061 s
170 19/07/10 11:40:29 INFO DAGScheduler: Job 3 finished: collect at KMeans.scala:414, took 0.084564 s
171 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 320.0 B, free 344.7 MB)
172 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 426.0 B, free 344.7 MB)
173 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 192.168.31.160:44595 (size: 426.0 B, free: 345.0 MB)
174 19/07/10 11:40:29 INFO SparkContext: Created broadcast 6 from broadcast at KMeans.scala:400
175 19/07/10 11:40:29 INFO SparkContext: Starting job: sum at KMeans.scala:406
176 19/07/10 11:40:29 INFO DAGScheduler: Got job 4 (sum at KMeans.scala:406) with 2 output partitions
177 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 4 (sum at KMeans.scala:406)
178 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List()
179 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List()
180 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[13] at map at KMeans.scala:403), which has no missing parents
181 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 5.7 KB, free 344.7 MB)
182 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.1 KB, free 344.7 MB)
183 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 192.168.31.160:44595 (size: 3.1 KB, free: 345.0 MB)
184 19/07/10 11:40:29 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1161
185 19/07/10 11:40:29 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 4 (MapPartitionsRDD[13] at map at KMeans.scala:403) (first 15 tasks are for partitions Vector(0, 1))
186 19/07/10 11:40:29 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
187 19/07/10 11:40:29 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 8, localhost, executor driver, partition 0, PROCESS_LOCAL, 8264 bytes)
188 19/07/10 11:40:29 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID 9, localhost, executor driver, partition 1, PROCESS_LOCAL, 8264 bytes)
189 19/07/10 11:40:29 INFO Executor: Running task 0.0 in stage 4.0 (TID 8)
190 19/07/10 11:40:29 INFO Executor: Running task 1.0 in stage 4.0 (TID 9)
191 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_0 locally
192 19/07/10 11:40:29 INFO BlockManager: Found block rdd_1_1 locally
193 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_1 locally
194 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_1 locally
195 19/07/10 11:40:29 INFO BlockManager: Found block rdd_3_0 locally
196 19/07/10 11:40:29 INFO BlockManager: Found block rdd_9_0 locally
197 19/07/10 11:40:29 INFO MemoryStore: Block rdd_13_1 stored as values in memory (estimated size 32.0 B, free 344.7 MB)
198 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_13_1 in memory on 192.168.31.160:44595 (size: 32.0 B, free: 345.0 MB)
199 19/07/10 11:40:29 INFO MemoryStore: Block rdd_13_0 stored as values in memory (estimated size 48.0 B, free 344.7 MB)
200 19/07/10 11:40:29 INFO Executor: Finished task 1.0 in stage 4.0 (TID 9). 834 bytes result sent to driver
201 19/07/10 11:40:29 INFO TaskSetManager: Finished task 1.0 in stage 4.0 (TID 9) in 64 ms on localhost (executor driver) (1/2)
202 19/07/10 11:40:29 INFO BlockManagerInfo: Added rdd_13_0 in memory on 192.168.31.160:44595 (size: 48.0 B, free: 345.0 MB)
203 19/07/10 11:40:29 INFO Executor: Finished task 0.0 in stage 4.0 (TID 8). 834 bytes result sent to driver
204 19/07/10 11:40:29 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 8) in 75 ms on localhost (executor driver) (2/2)
205 19/07/10 11:40:29 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
206 19/07/10 11:40:29 INFO DAGScheduler: ResultStage 4 (sum at KMeans.scala:406) finished in 0.108 s
207 19/07/10 11:40:29 INFO DAGScheduler: Job 4 finished: sum at KMeans.scala:406, took 0.129273 s
208 19/07/10 11:40:29 INFO MapPartitionsRDD: Removing RDD 9 from persistence list
209 19/07/10 11:40:29 INFO BlockManager: Removing RDD 9
210 19/07/10 11:40:29 INFO SparkContext: Starting job: collect at KMeans.scala:414
211 19/07/10 11:40:29 INFO DAGScheduler: Got job 5 (collect at KMeans.scala:414) with 2 output partitions
212 19/07/10 11:40:29 INFO DAGScheduler: Final stage: ResultStage 5 (collect at KMeans.scala:414)
213 19/07/10 11:40:29 INFO DAGScheduler: Parents of final stage: List()
214 19/07/10 11:40:29 INFO DAGScheduler: Missing parents: List()
215 19/07/10 11:40:29 INFO DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[15] at mapPartitionsWithIndex at KMeans.scala:411), which has no missing parents
216 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 6.4 KB, free 344.7 MB)
217 19/07/10 11:40:29 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 3.4 KB, free 344.7 MB)
218 19/07/10 11:40:29 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 192.168.31.160:44595 (size: 3.4 KB, free: 345.0 MB)
219 19/07/10 11:40:30 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1161
220 19/07/10 11:40:30 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (MapPartitionsRDD[15] at mapPartitionsWithIndex at KMeans.scala:411) (first 15 tasks are for partitions Vector(0, 1))
221 19/07/10 11:40:30 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
222 19/07/10 11:40:30 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 10, localhost, executor driver, partition 0, PROCESS_LOCAL, 8296 bytes)
223 19/07/10 11:40:30 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 11, localhost, executor driver, partition 1, PROCESS_LOCAL, 8296 bytes)
224 19/07/10 11:40:30 INFO Executor: Running task 0.0 in stage 5.0 (TID 10)
225 19/07/10 11:40:30 INFO Executor: Running task 1.0 in stage 5.0 (TID 11)
226 19/07/10 11:40:30 INFO BlockManager: Found block rdd_1_1 locally
227 19/07/10 11:40:30 INFO BlockManager: Found block rdd_3_1 locally
228 19/07/10 11:40:30 INFO BlockManager: Found block rdd_1_0 locally
229 19/07/10 11:40:30 INFO BlockManager: Found block rdd_3_0 locally
230 19/07/10 11:40:30 INFO BlockManager: Found block rdd_13_0 locally
231 19/07/10 11:40:30 INFO Executor: Finished task 0.0 in stage 5.0 (TID 10). 1132 bytes result sent to driver
232 19/07/10 11:40:30 INFO BlockManager: Found block rdd_13_1 locally
233 19/07/10 11:40:30 INFO Executor: Finished task 1.0 in stage 5.0 (TID 11). 826 bytes result sent to driver
234 19/07/10 11:40:30 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 10) in 56 ms on localhost (executor driver) (1/2)
235 19/07/10 11:40:30 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 11) in 58 ms on localhost (executor driver) (2/2)
236 19/07/10 11:40:30 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 
237 19/07/10 11:40:30 INFO DAGScheduler: ResultStage 5 (collect at KMeans.scala:414) finished in 0.147 s
238 19/07/10 11:40:30 INFO DAGScheduler: Job 5 finished: collect at KMeans.scala:414, took 0.178237 s
239 19/07/10 11:40:30 INFO MapPartitionsRDD: Removing RDD 13 from persistence list
240 19/07/10 11:40:30 INFO BlockManager: Removing RDD 13
241 19/07/10 11:40:30 INFO TorrentBroadcast: Destroying Broadcast(3) (from destroy at KMeans.scala:421)
242 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 27
243 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 42
244 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 51
245 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 124
246 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 32
247 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 72
248 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 47
249 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 84
250 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 36
251 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 73
252 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 46
253 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 25
254 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 130
255 19/07/10 11:40:30 INFO TorrentBroadcast: Destroying Broadcast(6) (from destroy at KMeans.scala:421)
256 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.31.160:44595 in memory (size: 344.0 B, free: 345.0 MB)
257 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.31.160:44595 in memory (size: 2.9 KB, free: 345.0 MB)
258 19/07/10 11:40:30 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 568.0 B, free 344.7 MB)
259 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_6_piece0 on 192.168.31.160:44595 in memory (size: 426.0 B, free: 345.0 MB)
260 19/07/10 11:40:30 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 529.0 B, free 344.7 MB)
261 19/07/10 11:40:30 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on 192.168.31.160:44595 (size: 529.0 B, free: 345.0 MB)
262 19/07/10 11:40:30 INFO SparkContext: Created broadcast 9 from broadcast at KMeans.scala:431
263 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 116
264 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 94
265 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 90
266 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 96
267 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 89
268 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 81
269 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 121
270 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 149
271 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 145
272 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 50
273 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_7_piece0 on 192.168.31.160:44595 in memory (size: 3.1 KB, free: 345.0 MB)
274 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 135
275 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 108
276 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_8_piece0 on 192.168.31.160:44595 in memory (size: 3.4 KB, free: 345.0 MB)
277 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 34
278 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 106
279 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 70
280 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 44
281 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 57
282 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 105
283 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 97
284 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 31
285 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 33
286 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 113
287 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 140
288 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 100
289 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 43
290 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 68
291 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 133
292 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 138
293 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 39
294 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 129
295 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 49
296 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 37
297 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 85
298 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 132
299 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 53
300 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 82
301 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 69
302 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 104
303 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 52
304 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 98
305 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 141
306 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 76
307 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 77
308 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 102
309 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 134
310 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 79
311 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 61
312 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 59
313 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 118
314 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 74
315 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 54
316 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 86
317 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 136
318 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 110
319 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 45
320 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 30
321 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 60
322 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 64
323 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 137
324 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 95
325 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 87
326 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 38
327 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 29
328 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 56
329 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 125
330 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 131
331 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 41
332 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 55
333 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 128
334 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 127
335 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 88
336 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 91
337 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 123
338 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 103
339 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 115
340 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 139
341 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 142
342 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 147
343 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_5_piece0 on 192.168.31.160:44595 in memory (size: 3.3 KB, free: 345.0 MB)
344 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 26
345 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 48
346 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 78
347 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 93
348 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 40
349 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 35
350 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 63
351 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 75
352 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 112
353 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 146
354 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 58
355 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 62
356 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 83
357 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 101
358 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 148
359 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 144
360 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 111
361 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 117
362 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 71
363 19/07/10 11:40:30 INFO BlockManagerInfo: Removed broadcast_4_piece0 on 192.168.31.160:44595 in memory (size: 3.0 KB, free: 345.0 MB)
364 19/07/10 11:40:30 INFO SparkContext: Starting job: countByValue at KMeans.scala:434
365 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 28
366 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 122
367 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 143
368 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 65
369 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 126
370 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 92
371 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 114
372 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 120
373 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 80
374 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 107
375 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 109
376 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 99
377 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 66
378 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 67
379 19/07/10 11:40:30 INFO ContextCleaner: Cleaned accumulator 119
380 19/07/10 11:40:31 INFO DAGScheduler: Registering RDD 18 (countByValue at KMeans.scala:434)
381 19/07/10 11:40:31 INFO DAGScheduler: Got job 6 (countByValue at KMeans.scala:434) with 2 output partitions
382 19/07/10 11:40:31 INFO DAGScheduler: Final stage: ResultStage 7 (countByValue at KMeans.scala:434)
383 19/07/10 11:40:31 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
384 19/07/10 11:40:31 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 6)
385 19/07/10 11:40:31 INFO DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[18] at countByValue at KMeans.scala:434), which has no missing parents
386 19/07/10 11:40:31 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 6.7 KB, free 344.8 MB)
387 19/07/10 11:40:31 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 3.7 KB, free 344.8 MB)
388 19/07/10 11:40:31 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on 192.168.31.160:44595 (size: 3.7 KB, free: 345.0 MB)
389 19/07/10 11:40:31 INFO SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1161
390 19/07/10 11:40:31 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[18] at countByValue at KMeans.scala:434) (first 15 tasks are for partitions Vector(0, 1))
391 19/07/10 11:40:31 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
392 19/07/10 11:40:31 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 12, localhost, executor driver, partition 0, PROCESS_LOCAL, 8189 bytes)
393 19/07/10 11:40:31 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 13, localhost, executor driver, partition 1, PROCESS_LOCAL, 8189 bytes)
394 19/07/10 11:40:31 INFO Executor: Running task 0.0 in stage 6.0 (TID 12)
395 19/07/10 11:40:31 INFO Executor: Running task 1.0 in stage 6.0 (TID 13)
396 19/07/10 11:40:31 INFO BlockManager: Found block rdd_1_0 locally
397 19/07/10 11:40:31 INFO BlockManager: Found block rdd_3_0 locally
398 19/07/10 11:40:31 INFO BlockManager: Found block rdd_1_1 locally
399 19/07/10 11:40:31 INFO BlockManager: Found block rdd_3_1 locally
400 19/07/10 11:40:31 INFO Executor: Finished task 1.0 in stage 6.0 (TID 13). 1156 bytes result sent to driver
401 19/07/10 11:40:31 INFO Executor: Finished task 0.0 in stage 6.0 (TID 12). 1113 bytes result sent to driver
402 19/07/10 11:40:31 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 13) in 294 ms on localhost (executor driver) (1/2)
403 19/07/10 11:40:31 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 12) in 300 ms on localhost (executor driver) (2/2)
404 19/07/10 11:40:31 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
405 19/07/10 11:40:31 INFO DAGScheduler: ShuffleMapStage 6 (countByValue at KMeans.scala:434) finished in 0.367 s
406 19/07/10 11:40:31 INFO DAGScheduler: looking for newly runnable stages
407 19/07/10 11:40:31 INFO DAGScheduler: running: Set()
408 19/07/10 11:40:31 INFO DAGScheduler: waiting: Set(ResultStage 7)
409 19/07/10 11:40:31 INFO DAGScheduler: failed: Set()
410 19/07/10 11:40:32 INFO DAGScheduler: Submitting ResultStage 7 (ShuffledRDD[19] at countByValue at KMeans.scala:434), which has no missing parents
411 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 3.3 KB, free 344.7 MB)
412 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 2.0 KB, free 344.7 MB)
413 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on 192.168.31.160:44595 (size: 2.0 KB, free: 345.0 MB)
414 19/07/10 11:40:32 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1161
415 19/07/10 11:40:32 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (ShuffledRDD[19] at countByValue at KMeans.scala:434) (first 15 tasks are for partitions Vector(0, 1))
416 19/07/10 11:40:32 INFO TaskSchedulerImpl: Adding task set 7.0 with 2 tasks
417 19/07/10 11:40:32 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 14, localhost, executor driver, partition 0, ANY, 7662 bytes)
418 19/07/10 11:40:32 INFO TaskSetManager: Starting task 1.0 in stage 7.0 (TID 15, localhost, executor driver, partition 1, ANY, 7662 bytes)
419 19/07/10 11:40:32 INFO Executor: Running task 0.0 in stage 7.0 (TID 14)
420 19/07/10 11:40:32 INFO Executor: Running task 1.0 in stage 7.0 (TID 15)
421 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
422 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 19 ms
423 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
424 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 15 ms
425 19/07/10 11:40:32 INFO Executor: Finished task 1.0 in stage 7.0 (TID 15). 1415 bytes result sent to driver
426 19/07/10 11:40:32 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID 15) in 290 ms on localhost (executor driver) (1/2)
427 19/07/10 11:40:32 INFO Executor: Finished task 0.0 in stage 7.0 (TID 14). 1372 bytes result sent to driver
428 19/07/10 11:40:32 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 14) in 308 ms on localhost (executor driver) (2/2)
429 19/07/10 11:40:32 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 
430 19/07/10 11:40:32 INFO DAGScheduler: ResultStage 7 (countByValue at KMeans.scala:434) finished in 0.347 s
431 19/07/10 11:40:32 INFO DAGScheduler: Job 6 finished: countByValue at KMeans.scala:434, took 1.822957 s
432 19/07/10 11:40:32 INFO TorrentBroadcast: Destroying Broadcast(9) (from destroy at KMeans.scala:436)
433 19/07/10 11:40:32 INFO BlockManagerInfo: Removed broadcast_9_piece0 on 192.168.31.160:44595 in memory (size: 529.0 B, free: 345.0 MB)
434 19/07/10 11:40:32 INFO LocalKMeans: Local KMeans++ converged in 2 iterations.
435 19/07/10 11:40:32 INFO KMeans: Initialization with k-means|| took 5.977 seconds.
436 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 296.0 B, free 344.7 MB)
437 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 334.0 B, free 344.7 MB)
438 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on 192.168.31.160:44595 (size: 334.0 B, free: 345.0 MB)
439 19/07/10 11:40:32 INFO SparkContext: Created broadcast 12 from broadcast at KMeans.scala:299
440 19/07/10 11:40:32 INFO SparkContext: Starting job: collectAsMap at KMeans.scala:320
441 19/07/10 11:40:32 INFO DAGScheduler: Registering RDD 20 (mapPartitions at KMeans.scala:302)
442 19/07/10 11:40:32 INFO DAGScheduler: Got job 7 (collectAsMap at KMeans.scala:320) with 2 output partitions
443 19/07/10 11:40:32 INFO DAGScheduler: Final stage: ResultStage 9 (collectAsMap at KMeans.scala:320)
444 19/07/10 11:40:32 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 8)
445 19/07/10 11:40:32 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 8)
446 19/07/10 11:40:32 INFO DAGScheduler: Submitting ShuffleMapStage 8 (MapPartitionsRDD[20] at mapPartitions at KMeans.scala:302), which has no missing parents
447 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 6.3 KB, free 344.7 MB)
448 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 3.5 KB, free 344.7 MB)
449 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory on 192.168.31.160:44595 (size: 3.5 KB, free: 345.0 MB)
450 19/07/10 11:40:32 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1161
451 19/07/10 11:40:32 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 8 (MapPartitionsRDD[20] at mapPartitions at KMeans.scala:302) (first 15 tasks are for partitions Vector(0, 1))
452 19/07/10 11:40:32 INFO TaskSchedulerImpl: Adding task set 8.0 with 2 tasks
453 19/07/10 11:40:32 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 16, localhost, executor driver, partition 0, PROCESS_LOCAL, 8189 bytes)
454 19/07/10 11:40:32 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID 17, localhost, executor driver, partition 1, PROCESS_LOCAL, 8189 bytes)
455 19/07/10 11:40:32 INFO Executor: Running task 0.0 in stage 8.0 (TID 16)
456 19/07/10 11:40:32 INFO Executor: Running task 1.0 in stage 8.0 (TID 17)
457 19/07/10 11:40:32 INFO BlockManager: Found block rdd_1_0 locally
458 19/07/10 11:40:32 INFO BlockManager: Found block rdd_3_0 locally
459 19/07/10 11:40:32 INFO BlockManager: Found block rdd_1_1 locally
460 19/07/10 11:40:32 INFO BlockManager: Found block rdd_3_1 locally
461 19/07/10 11:40:32 INFO Executor: Finished task 1.0 in stage 8.0 (TID 17). 1226 bytes result sent to driver
462 19/07/10 11:40:32 INFO Executor: Finished task 0.0 in stage 8.0 (TID 16). 1226 bytes result sent to driver
463 19/07/10 11:40:32 INFO TaskSetManager: Finished task 1.0 in stage 8.0 (TID 17) in 73 ms on localhost (executor driver) (1/2)
464 19/07/10 11:40:32 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID 16) in 82 ms on localhost (executor driver) (2/2)
465 19/07/10 11:40:32 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool 
466 19/07/10 11:40:32 INFO DAGScheduler: ShuffleMapStage 8 (mapPartitions at KMeans.scala:302) finished in 0.135 s
467 19/07/10 11:40:32 INFO DAGScheduler: looking for newly runnable stages
468 19/07/10 11:40:32 INFO DAGScheduler: running: Set()
469 19/07/10 11:40:32 INFO DAGScheduler: waiting: Set(ResultStage 9)
470 19/07/10 11:40:32 INFO DAGScheduler: failed: Set()
471 19/07/10 11:40:32 INFO DAGScheduler: Submitting ResultStage 9 (ShuffledRDD[21] at reduceByKey at KMeans.scala:317), which has no missing parents
472 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 2.8 KB, free 344.7 MB)
473 19/07/10 11:40:32 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 1742.0 B, free 344.7 MB)
474 19/07/10 11:40:32 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on 192.168.31.160:44595 (size: 1742.0 B, free: 345.0 MB)
475 19/07/10 11:40:32 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1161
476 19/07/10 11:40:32 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 9 (ShuffledRDD[21] at reduceByKey at KMeans.scala:317) (first 15 tasks are for partitions Vector(0, 1))
477 19/07/10 11:40:32 INFO TaskSchedulerImpl: Adding task set 9.0 with 2 tasks
478 19/07/10 11:40:32 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 18, localhost, executor driver, partition 0, ANY, 7662 bytes)
479 19/07/10 11:40:32 INFO TaskSetManager: Starting task 1.0 in stage 9.0 (TID 19, localhost, executor driver, partition 1, ANY, 7662 bytes)
480 19/07/10 11:40:32 INFO Executor: Running task 0.0 in stage 9.0 (TID 18)
481 19/07/10 11:40:32 INFO Executor: Running task 1.0 in stage 9.0 (TID 19)
482 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
483 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
484 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
485 19/07/10 11:40:32 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
486 19/07/10 11:40:32 INFO Executor: Finished task 0.0 in stage 9.0 (TID 18). 1531 bytes result sent to driver
487 19/07/10 11:40:32 INFO Executor: Finished task 1.0 in stage 9.0 (TID 19). 1498 bytes result sent to driver
488 19/07/10 11:40:32 INFO TaskSetManager: Finished task 0.0 in stage 9.0 (TID 18) in 73 ms on localhost (executor driver) (1/2)
489 19/07/10 11:40:33 INFO TaskSetManager: Finished task 1.0 in stage 9.0 (TID 19) in 79 ms on localhost (executor driver) (2/2)
490 19/07/10 11:40:33 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool 
491 19/07/10 11:40:33 INFO DAGScheduler: ResultStage 9 (collectAsMap at KMeans.scala:320) finished in 0.182 s
492 19/07/10 11:40:33 INFO DAGScheduler: Job 7 finished: collectAsMap at KMeans.scala:320, took 0.369581 s
493 19/07/10 11:40:33 INFO TorrentBroadcast: Destroying Broadcast(12) (from destroy at KMeans.scala:330)
494 19/07/10 11:40:33 INFO KMeans: Iterations took 0.564 seconds.
495 19/07/10 11:40:33 INFO KMeans: KMeans converged in 1 iterations.
496 19/07/10 11:40:33 INFO KMeans: The cost is 0.07500000000004324.
497 19/07/10 11:40:33 INFO BlockManagerInfo: Removed broadcast_12_piece0 on 192.168.31.160:44595 in memory (size: 334.0 B, free: 345.0 MB)
498 19/07/10 11:40:33 INFO MapPartitionsRDD: Removing RDD 3 from persistence list
499 19/07/10 11:40:33 INFO BlockManager: Removing RDD 3
500 19/07/10 11:40:33 WARN KMeans: The input data was not directly cached, which may hurt performance if its parent RDDs are also uncached.
501 [0.1,0.1,0.1]
502 [9.05,9.05,9.05]
503 [9.2,9.2,9.2]
504 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 296.0 B, free 344.7 MB)
505 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 334.0 B, free 344.7 MB)
506 19/07/10 11:40:33 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on 192.168.31.160:44595 (size: 334.0 B, free: 345.0 MB)
507 19/07/10 11:40:33 INFO SparkContext: Created broadcast 15 from broadcast at KMeansModel.scala:102
508 19/07/10 11:40:33 INFO SparkContext: Starting job: sum at KMeansModel.scala:105
509 19/07/10 11:40:33 INFO DAGScheduler: Got job 8 (sum at KMeansModel.scala:105) with 2 output partitions
510 19/07/10 11:40:33 INFO DAGScheduler: Final stage: ResultStage 10 (sum at KMeansModel.scala:105)
511 19/07/10 11:40:33 INFO DAGScheduler: Parents of final stage: List()
512 19/07/10 11:40:33 INFO DAGScheduler: Missing parents: List()
513 19/07/10 11:40:33 INFO DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[22] at map at KMeansModel.scala:103), which has no missing parents
514 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 5.3 KB, free 344.7 MB)
515 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 3.0 KB, free 344.7 MB)
516 19/07/10 11:40:33 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory on 192.168.31.160:44595 (size: 3.0 KB, free: 345.0 MB)
517 19/07/10 11:40:33 INFO SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1161
518 19/07/10 11:40:33 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 10 (MapPartitionsRDD[22] at map at KMeansModel.scala:103) (first 15 tasks are for partitions Vector(0, 1))
519 19/07/10 11:40:33 INFO TaskSchedulerImpl: Adding task set 10.0 with 2 tasks
520 19/07/10 11:40:33 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 20, localhost, executor driver, partition 0, PROCESS_LOCAL, 7889 bytes)
521 19/07/10 11:40:33 INFO TaskSetManager: Starting task 1.0 in stage 10.0 (TID 21, localhost, executor driver, partition 1, PROCESS_LOCAL, 7889 bytes)
522 19/07/10 11:40:33 INFO Executor: Running task 0.0 in stage 10.0 (TID 20)
523 19/07/10 11:40:33 INFO Executor: Running task 1.0 in stage 10.0 (TID 21)
524 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_1 locally
525 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_0 locally
526 19/07/10 11:40:33 INFO Executor: Finished task 0.0 in stage 10.0 (TID 20). 834 bytes result sent to driver
527 19/07/10 11:40:33 INFO Executor: Finished task 1.0 in stage 10.0 (TID 21). 834 bytes result sent to driver
528 19/07/10 11:40:33 INFO TaskSetManager: Finished task 0.0 in stage 10.0 (TID 20) in 30 ms on localhost (executor driver) (1/2)
529 19/07/10 11:40:33 INFO TaskSetManager: Finished task 1.0 in stage 10.0 (TID 21) in 31 ms on localhost (executor driver) (2/2)
530 19/07/10 11:40:33 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool 
531 19/07/10 11:40:33 INFO DAGScheduler: ResultStage 10 (sum at KMeansModel.scala:105) finished in 0.066 s
532 19/07/10 11:40:33 INFO DAGScheduler: Job 8 finished: sum at KMeansModel.scala:105, took 0.074275 s
533 19/07/10 11:40:33 INFO TorrentBroadcast: Destroying Broadcast(15) (from destroy at KMeansModel.scala:106)
534 误差为:0.07500000000004324
535 19/07/10 11:40:33 INFO BlockManagerInfo: Removed broadcast_15_piece0 on 192.168.31.160:44595 in memory (size: 334.0 B, free: 345.0 MB)
536 19/07/10 11:40:33 INFO SparkContext: Starting job: foreach at Kmeans.scala:74
537 19/07/10 11:40:33 INFO DAGScheduler: Got job 9 (foreach at Kmeans.scala:74) with 2 output partitions
538 19/07/10 11:40:33 INFO DAGScheduler: Final stage: ResultStage 11 (foreach at Kmeans.scala:74)
539 19/07/10 11:40:33 INFO DAGScheduler: Parents of final stage: List()
540 19/07/10 11:40:33 INFO DAGScheduler: Missing parents: List()
541 19/07/10 11:40:33 INFO DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[23] at map at Kmeans.scala:68), which has no missing parents
542 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 4.6 KB, free 344.7 MB)
543 19/07/10 11:40:33 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 2.7 KB, free 344.7 MB)
544 19/07/10 11:40:33 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 192.168.31.160:44595 (size: 2.7 KB, free: 345.0 MB)
545 19/07/10 11:40:33 INFO SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1161
546 19/07/10 11:40:33 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 11 (MapPartitionsRDD[23] at map at Kmeans.scala:68) (first 15 tasks are for partitions Vector(0, 1))
547 19/07/10 11:40:33 INFO TaskSchedulerImpl: Adding task set 11.0 with 2 tasks
548 19/07/10 11:40:33 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 22, localhost, executor driver, partition 0, PROCESS_LOCAL, 7889 bytes)
549 19/07/10 11:40:33 INFO TaskSetManager: Starting task 1.0 in stage 11.0 (TID 23, localhost, executor driver, partition 1, PROCESS_LOCAL, 7889 bytes)
550 19/07/10 11:40:33 INFO Executor: Running task 0.0 in stage 11.0 (TID 22)
551 19/07/10 11:40:33 INFO Executor: Running task 1.0 in stage 11.0 (TID 23)
552 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_1 locally
553 19/07/10 11:40:33 INFO BlockManager: Found block rdd_1_0 locally
554 [0.0,0.0,0.0]==>0
555 [0.1,0.1,0.1]==>0
556 [0.2,0.2,0.2]==>0
557 [9.0,9.0,9.0]==>1
558 19/07/10 11:40:33 INFO Executor: Finished task 0.0 in stage 11.0 (TID 22). 837 bytes result sent to driver
559 [9.1,9.1,9.1]==>1
560 [9.2,9.2,9.2]==>2
561 19/07/10 11:40:33 INFO Executor: Finished task 1.0 in stage 11.0 (TID 23). 794 bytes result sent to driver
562 19/07/10 11:40:33 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 22) in 35 ms on localhost (executor driver) (1/2)
563 19/07/10 11:40:33 INFO TaskSetManager: Finished task 1.0 in stage 11.0 (TID 23) in 37 ms on localhost (executor driver) (2/2)
564 19/07/10 11:40:33 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 
565 19/07/10 11:40:33 INFO DAGScheduler: ResultStage 11 (foreach at Kmeans.scala:74) finished in 0.074 s
566 19/07/10 11:40:33 INFO DAGScheduler: Job 9 finished: foreach at Kmeans.scala:74, took 0.090780 s
567 19/07/10 11:40:33 INFO SparkContext: Invoking stop() from shutdown hook
568 19/07/10 11:40:33 INFO SparkUI: Stopped Spark web UI at http://192.168.31.160:4040
569 19/07/10 11:40:33 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
570 19/07/10 11:40:34 INFO MemoryStore: MemoryStore cleared
571 19/07/10 11:40:34 INFO BlockManager: BlockManager stopped
572 19/07/10 11:40:34 INFO BlockManagerMaster: BlockManagerMaster stopped
573 19/07/10 11:40:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
574 19/07/10 11:40:34 INFO SparkContext: Successfully stopped SparkContext
575 19/07/10 11:40:34 INFO ShutdownHookManager: Shutdown hook called
576 19/07/10 11:40:34 INFO ShutdownHookManager: Deleting directory /tmp/spark-21be76b6-98d0-46ec-a1d6-640cb5556eff
577 
578 Process finished with exit code 0
View Code

     做一个总结,对于这种版本问题导致的程序不能正常运行的问题,如果长期不解决,或者没有提示,真的是让人很容易放弃这个产品的。

posted @ 2019-07-10 12:30  精心出精品  阅读(739)  评论(0编辑  收藏  举报