nebula 算法提交spark任务时报错 NoSuchMethodError java.lang.String com.google.common.net.HostAndPort.getHostText 解决
写在前面,根因问题是我下载了spark without hadoop版本导致,应该使用spark with hadoop 版本!!!
----------------------------------------------
参考步骤:https://docs.nebula-graph.com.cn/3.1.0/nebula-algorithm/#_2
错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | 2022 - 05 - 31 17 : 40 : 16 , 459 WARN [main] util.Utils (Logging.scala:logWarning( 66 )) - Your hostname, bonelee - VirtualBox resolves to a loopback address: 127.0 . 1.1 ; using 10.0 . 2.15 instead (on interface enp0s3) 2022 - 05 - 31 17 : 40 : 16 , 461 WARN [main] util.Utils (Logging.scala:logWarning( 66 )) - Set SPARK_LOCAL_IP if you need to bind to another address Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava / lang / String;Ljava / lang / Object ;)V at org.apache.hadoop.conf.Configuration. set (Configuration.java: 1357 ) at org.apache.hadoop.conf.Configuration. set (Configuration.java: 1338 ) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala: 464 ) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala: 436 ) at org.apache.spark.deploy.SparkSubmit$$anonfun$ 2.apply (SparkSubmit.scala: 323 ) at org.apache.spark.deploy.SparkSubmit$$anonfun$ 2.apply (SparkSubmit.scala: 323 ) at scala.Option.getOrElse(Option.scala: 121 ) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala: 323 ) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala: 784 ) at org.apache.spark.deploy.SparkSubmit.doRunMain$ 1 (SparkSubmit.scala: 161 ) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala: 184 ) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala: 86 ) at org.apache.spark.deploy.SparkSubmit$$anon$ 2.doSubmit (SparkSubmit.scala: 930 ) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: 939 ) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) |
提交方式:
1 2 3 4 5 6 7 | spark - submit - - master "local" \ - - conf spark.app.name = "g1" \ - - conf spark.executor.extraLibraryPath = / home / bonelee / Desktop / nebula - algorithm / guava - 14.0 .jar \ - - conf spark.executor.extraClassPath = / home / bonelee / Desktop / nebula - algorithm / guava - 14.0 .jar \ - - driver - class - path / home / bonelee / Desktop / nebula - algorithm / guava - 14.0 .jar \ - - driver - library - path / home / bonelee / Desktop / nebula - algorithm / guava - 14.0 .jar \ - - class com.vesoft.nebula.algorithm.Main nebula - algorithm / target / nebula - algorithm - 3.0 . 0.jar - p / home / bonelee / Desktop / nebula - algorithm / application.conf |
版本说明
Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_333).
Spark: 2.4.8-bin-without-hadoop
hadoop:3.2.3
nebula:3.0.0
nebula-algorithm:3.0.0
application配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | { # Spark relation config spark: { app: { name: LPA # spark.app.partitionNum partitionNum: 100 } master:local } data: { # data source. optional of nebula,csv,json source: nebula # data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text sink: csv # if your algorithm needs weight hasWeight: false } # Nebula Graph relation config nebula: { # algo's data source from Nebula. If data.source is nebula, then this nebula.read config can be valid. read: { # Nebula metad server address, multiple addresses are split by English comma metaAddress: "127.0.0.1:9559" # Nebula space space: basketballplayer # Nebula edge types, multiple labels means that data from multiple edges will union together labels: [ "serve" ] # Nebula edge property name for each edge type, this property will be as weight col for algorithm. # Make sure the weightCols are corresponding to labels. weightCols: [ "start_year" ] } # algo result sink into Nebula. If data.sink is nebula, then this nebula.write config can be valid. write:{ # Nebula graphd server address, multiple addresses are split by English comma graphAddress: "127.0.0.1:9669" # Nebula metad server address, multiple addresses are split by English comma metaAddress: "127.0.0.1:9559" user:root pswd:nebula # Nebula space name space:nb # Nebula tag name, the algorithm result will be write into this tag tag:pagerank # algorithm result is insert into new tag or update to original tag. type: insert/update type :insert } } local: { # algo's data source from Nebula. If data.source is csv or json, then this local.read can be valid. read:{ filePath: "file:///tmp/algo_edge.csv" srcId: "src" # dstId column dstId: "dst" # weight column weight: "weight" # if csv file has header header: true # csv file's delimiter delimiter: "," } # algo result sink into local file. If data.sink is csv or text, then this local.write can be valid. write:{ resultPath: / tmp / count } } algorithm: { # the algorithm that you are going to execute,pick one from [pagerank, louvain, connectedcomponent, # labelpropagation, shortestpaths, degreestatic, kcore, stronglyconnectedcomponent, trianglecount, # betweenness, graphtriangleCount, clusteringcoefficient, bfs, hanp, closeness, jaccard, node2vec] executeAlgo: graphtrianglecount # PageRank parameter pagerank: { maxIter: 10 resetProb: 0.15 # default 0.15 } # Louvain parameter louvain: { maxIter: 20 internalIter: 10 tol: 0.5 } # connected component parameter. connectedcomponent: { maxIter: 20 } # LabelPropagation parameter labelpropagation: { maxIter: 20 } # ShortestPaths parameter shortestpaths: { # several vertices to compute the shortest path to all vertices. landmarks: "1" } # Vertex degree statistics parameter degreestatic: {} # KCore parameter kcore:{ maxIter: 10 degree: 1 } # Trianglecount parameter trianglecount:{} # graphTriangleCount parameter graphtrianglecount:{} # Betweenness centrality parameter. maxIter parameter means the max times of iterations. betweenness:{ maxIter: 5 } # Clustering Coefficient parameter. The type parameter has two choice, local or global # local type will compute the clustering coefficient for each vertex, and print the average coefficient for graph. # global type just compute the graph's clustering coefficient. clusteringcoefficient:{ type : local } # ClosenessAlgo parameter closeness:{} # BFS parameter bfs:{ maxIter: 5 root: "10" } # HanpAlgo parameter hanp:{ hopAttenuation: 0.1 maxIter: 10 preference: 1.0 } #Node2vecAlgo parameter node2vec:{ maxIter: 10 , lr: 0.025 , dataNumPartition: 10 , modelNumPartition: 10 , dim: 10 , window: 3 , walkLength: 5 , numWalks: 3 , p: 1.0 , q: 1.0 , directed: false, degree: 30 , embSeparate: "," , modelPath: "hdfs://127.0.0.1:9000/model" } # JaccardAlgo parameter jaccard:{ tol: 1.0 } } } |
如果将上述数据源nebula修改为csv则运行不会出错!!!
------------------------------------------------------------------------------------------------------------------------------
使用nebula graph 算法提交spark任务时报错:读取数据源使用的nebula!!!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
Exception in thread "main" java.lang.NoSuchMethodError: 'java.lang.String com.google.common.net.HostAndPort.getHostText()' at com.vesoft.nebula.connector.NebulaOptions$$anonfun$getMetaAddress$ 1.apply (NebulaOptions.scala: 186 ) at com.vesoft.nebula.connector.NebulaOptions$$anonfun$getMetaAddress$ 1.apply (NebulaOptions.scala: 183 ) at scala.collection.IndexedSeqOptimized$ class .foreach(IndexedSeqOptimized.scala: 33 ) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala: 186 ) at com.vesoft.nebula.connector.NebulaOptions.getMetaAddress(NebulaOptions.scala: 183 ) at com.vesoft.nebula.connector.reader.NebulaSourceReader.getSchema(NebulaSourceReader.scala: 45 ) at com.vesoft.nebula.connector.reader.NebulaSourceReader.readSchema(NebulaSourceReader.scala: 30 ) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala: 175 ) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala: 223 ) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala: 186 ) at com.vesoft.nebula.connector.connector.package$NebulaDataFrameReader.loadEdgesToDF(package.scala: 172 ) at com.vesoft.nebula.algorithm.reader.NebulaReader$$anonfun$read$ 1.apply $mcVI$sp(DataReader.scala: 52 ) at scala.collection.immutable. Range .foreach$mVc$sp( Range .scala: 160 ) at com.vesoft.nebula.algorithm.reader.NebulaReader.read(DataReader.scala: 38 ) at com.vesoft.nebula.algorithm.Main$.createDataSource(Main.scala: 118 ) at com.vesoft.nebula.algorithm.Main$.main(Main.scala: 84 ) at com.vesoft.nebula.algorithm.Main.main(Main.scala) at java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base / jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 77 ) at java.base / jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.base / java.lang.reflect.Method.invoke(Method.java: 568 ) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala: 52 ) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala: 855 ) at org.apache.spark.deploy.SparkSubmit.doRunMain$ 1 (SparkSubmit.scala: 161 ) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala: 184 ) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala: 86 ) at org.apache.spark.deploy.SparkSubmit$$anon$ 2.doSubmit (SparkSubmit.scala: 930 ) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: 939 ) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) |
提交方式:
1
2
3
4
5
6
7
|
/ home / bonelee / spark - 2.4 . 8 - bin - without - hadoop / bin / spark - submit - - master "local" \ - - class com.vesoft.nebula.algorithm.Main nebula - algorithm / target / nebula - algorithm - 3.0 . 0.jar - p / home / bonelee / Desktop / nebula - algorithm / application.conf #--driver-class-path /home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \ #--driver-library-path /home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \ #--conf spark.executor.extraLibraryPath=/home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \ #--conf spark.executor.extraClassPath=/home/bonelee/Desktop/nebula-algorithm/guava-14.0.jar \ |
即便是使用:https://www.cnblogs.com/1394htw/p/15151913.html 文中说的方法,使用guava-14.0.jar依然会报错!
真郁闷,折腾了一下午。。。没有搞定。。。
绕过该问题的解决方法那就是:
修改application.conf文件数据源:
1 2 3 4 5 6 7 8 | data: { # data source. optional of nebula,csv,json source: nebula # data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text sink: csv # if your algorithm needs weight hasWeight: false } |
将上面的nebula修改为csv,同时配置下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | local: { # algo's data source from Nebula. If data.source is csv or json, then this local.read can be valid. read:{ filePath: "file:///tmp/algo_edge.csv" srcId: "src" # dstId column dstId: "dst" # weight column weight: "weight" # if csv file has header header: true # csv file's delimiter delimiter: "," } # algo result sink into local file. If data.sink is csv or text, then this local.write can be valid. write:{ resultPath: / tmp / count } } |
algo_edge.csv文件内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | src,dst,weight 1 , 1 , 5.0 1 , 2 , 1.0 1 , 3 , 5.0 1 , 4 , 1.0 2 , 1 , 5.0 2 , 2 , 1.0 2 , 3 , 5.0 2 , 4 , 1.0 3 , 1 , 1.0 3 , 2 , 5.0 3 , 3 , 1.0 3 , 4 , 5.0 4 , 1 , 1.0 4 , 2 , 5.0 4 , 3 , 1.0 4 , 4 , 5.0 |
这个时候,是可以正常输出结果的!
1 2 3 | bonelee@bonelee - VirtualBox:~ / Desktop / nebula - algorithm - 2.6 $ cat / tmp / count / part - 00000 - 7ae75bc1 - 80e7 - 4469 - 9dda - 268a8036db09 - c000.csv count 4 |
----------------------------------------------------
20220613解决方法,是在windows下spark 2.4.7跑通的,我要跪了。。。
关键步骤:
1、linux下修改配置文件
将etc/下面的nebula-graphd.conf,nebula-storaged.conf,nebula-metad.conf里的127.0.0.1修改成外部访问地址,我的是:8.35.24.181
1 2 3 4 5 6 7 | ########## networking ########## # Comma separated Meta Server Addresses --meta_server_addrs=8.35.24.181:9559 # Local IP used to identify the nebula-graphd process. # Change it to an address other than loopback if the service is distributed or # will be accessed remotely. --local_ip=8.35.24.181 |
然后记得add hosts,看到上线:
1 2 3 4 5 6 7 8 9 10 | (root@nebula) [algo_edge]> show hosts +---------------+------+-----------+-----------+--------------+----------------------+-----------------------------------+---------+ | Host | Port | HTTP port | Status | Leader count | Leader distribution | Partition distribution | Version | +---------------+------+-----------+-----------+--------------+----------------------+-----------------------------------+---------+ | "127.0.0.1" | 9779 | 19669 | "OFFLINE" | 0 | "No valid partition" | "basketballplayer:10, cadets:100" | "3.1.0" | | "8.35.24.181" | 9779 | 19669 | "ONLINE" | 10 | "algo_edge:10" | "algo_edge:10" | "3.1.0" | +---------------+------+-----------+-----------+--------------+----------------------+-----------------------------------+---------+ Got 2 rows (time spent 3122/3893 us) Mon, 13 Jun 2022 18:41:25 CST |
2、图数据库里插入数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | drop space algo_edge; create space algo_edge(partition_num=10,replica_factor=1,vid_type=INT64); :sleep 20 use algo_edge; create tag player(id int); create edge serve(serve_year int); :sleep 20 insert vertex player(id) values 1:(1); insert vertex player(id) values 2:(2); insert vertex player(id) values 3:(3); insert vertex player(id) values 4:(4); insert edge serve(serve_year) values 1->1:(5); insert edge serve(serve_year) values 1->2:(1); insert edge serve(serve_year) values 1->3:(5); insert edge serve(serve_year) values 1->4:(1); insert edge serve(serve_year) values 2->1:(5); insert edge serve(serve_year) values 2->2:(1); insert edge serve(serve_year) values 2->3:(5); insert edge serve(serve_year) values 2->4:(1); insert edge serve(serve_year) values 3->1:(1); insert edge serve(serve_year) values 3->2:(5); insert edge serve(serve_year) values 3->3:(1); insert edge serve(serve_year) values 3->4:(5); insert edge serve(serve_year) values 4->1:(1); insert edge serve(serve_year) values 4->2:(5); insert edge serve(serve_year) values 4->3:(1); insert edge serve(serve_year) values 4->4:(5); |
3、修改app.conf, 尤其是nebula连接那部分:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | { # Spark relation config spark: { app: { name: LPA # spark.app.partitionNum partitionNum:100 } master:local } data: { # data source. optional of nebula,csv,json source: nebula # data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text sink: csv # if your algorithm needs weight hasWeight: false } # Nebula Graph relation config nebula: { # algo's data source from Nebula. If data.source is nebula, then this nebula.read config can be valid. read: { # Nebula metad server address, multiple addresses are split by English comma metaAddress: "8.35.24.181:9559" # Nebula space space: algo_edge # Nebula edge types, multiple labels means that data from multiple edges will union together labels: [ "serve" ] # Nebula edge property name for each edge type, this property will be as weight col for algorithm. # Make sure the weightCols are corresponding to labels. weightCols: [ "serve_year" ] } # algo result sink into Nebula. If data.sink is nebula, then this nebula.write config can be valid. write:{ # Nebula graphd server address, multiple addresses are split by English comma graphAddress: "127.0.0.1:9669" # Nebula metad server address, multiple addresses are split by English comma metaAddress: "127.0.0.1:9559" user:root pswd:nebula # Nebula space name space:nb # Nebula tag name, the algorithm result will be write into this tag tag:pagerank # algorithm result is insert into new tag or update to original tag. type: insert/update type:insert } } local: { # algo's data source from Nebula. If data.source is csv or json, then this local.read can be valid. read:{ filePath: "file:///D:/tmp/algo_edge.csv" srcId: "src" # dstId column dstId: "dst" # weight column weight: "weight" # if csv file has header header: true # csv file's delimiter delimiter: "," } # algo result sink into local file. If data.sink is csv or text, then this local.write can be valid. write:{ resultPath: "file:///D:/tmp/result0613002" } } algorithm: { # the algorithm that you are going to execute,pick one from [pagerank, louvain, connectedcomponent, # labelpropagation, shortestpaths, degreestatic, kcore, stronglyconnectedcomponent, trianglecount, # betweenness, graphtriangleCount, clusteringcoefficient, bfs, hanp, closeness, jaccard, node2vec] executeAlgo: graphtrianglecount # PageRank parameter pagerank: { maxIter: 10 resetProb: 0.15 # default 0.15 } # Louvain parameter louvain: { maxIter: 20 internalIter: 10 tol: 0.5 } # connected component parameter. connectedcomponent: { maxIter: 20 } # LabelPropagation parameter labelpropagation: { maxIter: 20 } # ShortestPaths parameter shortestpaths: { # several vertices to compute the shortest path to all vertices. landmarks: "1" } # Vertex degree statistics parameter degreestatic: {} # KCore parameter kcore:{ maxIter:10 degree:1 } # Trianglecount parameter trianglecount:{} # graphTriangleCount parameter graphtrianglecount:{} # Betweenness centrality parameter. maxIter parameter means the max times of iterations. betweenness:{ maxIter:5 } # Clustering Coefficient parameter. The type parameter has two choice, local or global # local type will compute the clustering coefficient for each vertex, and print the average coefficient for graph. # global type just compute the graph's clustering coefficient. clusteringcoefficient:{ type: local } # ClosenessAlgo parameter closeness:{} # BFS parameter bfs:{ maxIter:5 root: "10" } # HanpAlgo parameter hanp:{ hopAttenuation:0.1 maxIter:10 preference:1.0 } #Node2vecAlgo parameter node2vec:{ maxIter: 10, lr: 0.025, dataNumPartition: 10, modelNumPartition: 10, dim: 10, window: 3, walkLength: 5, numWalks: 3, p: 1.0, q: 1.0, directed: false , degree: 30, embSeparate: "," , modelPath: "hdfs://127.0.0.1:9000/model" } # JaccardAlgo parameter jaccard:{ tol: 1.0 } } } |
4、然后运行spark 就可以成功连接nebula数据源并运行了!!!
1 | spark-submit --master "local" -- class com.vesoft.nebula.algorithm.Main nebula-algorithm-3.0.0.jar -p c.conf |
注意,我是安装的带hadoop的版本!
D:\app\spark-2.4.7-bin-hadoop2.7\bin\spark-submit
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
2018-05-30 leetcode 482. License Key Formatting
2018-05-30 leetcode 724. Find Pivot Index
2018-05-30 KNN cosine 余弦相似度计算