spark scala读取csv文件
将以下内容保存为small_zipcode.csv
id,zipcode,type,city,state,population 1,704,STANDARD,,PR,30100 2,704,,PASEO COSTA DEL SUR,PR, 3,709,,BDA SAN LUIS,PR,3700 4,76166,UNIQUE,CINGULAR WIRELESS,TX,84000 5,76177,STANDARD,,TX, ,,,,, 7,76179,STANDARD,,TX,
打开spark-shell交互式命令行
val filePath="small_zipcode.csv" val df=spark.read.options( Map("inferSchema"->"true","delimiter"->",","header"->"true")).csv(filePath) scala> df.show +----+-------+--------+-------------------+-----+----------+ | id|zipcode| type| city|state|population| +----+-------+--------+-------------------+-----+----------+ | 1| 704|STANDARD| null| PR| 30100| | 2| 704| null|PASEO COSTA DEL SUR| PR| null| | 3| 709| null| BDA SAN LUIS| PR| 3700| | 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000| | 5| 76177|STANDARD| null| TX| null| |null| null| null| null| null| null| | 7| 76179|STANDARD| null| TX| null| +----+-------+--------+-------------------+-----+----------+ scala> df.na.drop("all").show() +---+-------+--------+-------------------+-----+----------+ | id|zipcode| type| city|state|population| +---+-------+--------+-------------------+-----+----------+ | 1| 704|STANDARD| null| PR| 30100| | 2| 704| null|PASEO COSTA DEL SUR| PR| null| | 3| 709| null| BDA SAN LUIS| PR| 3700| | 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000| | 5| 76177|STANDARD| null| TX| null| | 7| 76179|STANDARD| null| TX| null| +---+-------+--------+-------------------+-----+----------+ scala> df.na.drop().show() +---+-------+------+-----------------+-----+----------+ | id|zipcode| type| city|state|population| +---+-------+------+-----------------+-----+----------+ | 4| 76166|UNIQUE|CINGULAR WIRELESS| TX| 84000| +---+-------+------+-----------------+-----+----------+
参考: N多spark使用示例:https://sparkbyexamples.com/spark/spark-dataframe-drop-rows-with-null-values/