No.10 缺失值的识别与处理
主要内容:
- 什么是缺失值
- 缺失值的识别
- 缺失模式探索
- 缺失值处理
1. 什么是缺失值
1.1查看R内置数据集
1 | data () |
1 | mydata <- mtcars |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | data () mydata <- mtcars #创造1个有空值的data frame #给mydata的第1列的第1-5行赋值为NA mydata[(1:5),1] <- NA mydata结果:> data () > mydata <- mtcars > mydata[(1:5),1] <- NA > mydata mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 NA 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag NA 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 NA 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout NA 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 |
手动改数据框
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 NA 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag NA 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 NA 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout NA 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.4 8 <strong> NA </strong> 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 <strong> NA </strong> 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.2 8 304.0 150 3.15 <strong> NA </strong> 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 NA 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 |
1.2 识别缺失值
1)is.na( ) 针对对象,包括向量和数据框
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #查看第一列缺失值的比例 sum (isnadata[,1])/ nrow (isnadata) #查看第一列缺失值的比例,也可用mean mean (isnadata[,1]) #查看第一hang缺失值的比例 mean (isnadata[1,]) #查看每一列的缺失情况 #apply(array, margin, ...)中的margin:1代表行,2代表列 apply (isnadata, 2, mean)结果:> #查看第一列缺失值的比例 > sum (isnadata[,1])/ nrow (isnadata) [1] 0.15625 > #查看第一列缺失值的比例,也可用mean > mean (isnadata[,1]) [1] 0.15625 > #查看第一hang缺失值的比例 > mean (isnadata[1,]) [1] 0.09090909 > > #查看每一列的缺失情况 > #apply(array, margin, ...)中的margin:1代表行,2代表列 > apply (isnadata, 2, mean) mpg cyl disp hp drat wt qsec vs am gear carb 0.15625 0.00000 0.03125 0.03125 0.00000 0.06250 0.00000 0.00000 0.00000 0.00000 0.00000 |
2)which 查看缺失值位置,适用向量
1 2 3 4 5 6 7 8 9 10 11 12 13 | #查看缺失值的位置 which (isnadata,arr.ind = T)结果:> #查看缺失值的位置 > which (isnadata,arr.ind = T) row col Mazda RX4 1 1 Mazda RX4 Wag 2 1 Datsun 710 3 1 Hornet 4 Drive 4 1 Hornet Sportabout 5 1 Cadillac Fleetwood 15 3 Camaro Z28 24 4 Toyota Corolla 20 6 AMC Javelin 23 6 |
3)complete.cases()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #查看行是否有缺失值 complete.cases (mydata) length ( complete.cases (mydata)) #显示有缺失值的行 mydata[! complete.cases (mydata),]结果:> #查看行是否有缺失值 > complete.cases (mydata) [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE [17] TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > length ( complete.cases (mydata)) [1] 32 > #显示有缺失值的行 > mydata[! complete.cases (mydata),] mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 NA 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag NA 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 NA 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout NA 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Cadillac Fleetwood 10.4 8 NA 205 2.93 5.250 17.98 0 0 3 4 Toyota Corolla 33.9 4 71.1 65 4.22 NA 19.90 1 1 4 1 AMC Javelin 15.2 8 304.0 150 3.15 NA 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 NA 3.73 3.840 15.41 0 0 3 4 |
4) mice 包
安装:
1 | install.packages ( "mice" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | library (mice) md.pattern (mydata) mydata结果:> md.pattern (mydata) cyl drat qsec vs am gear carb disp hp wt mpg 23 1 1 1 1 1 1 1 1 1 1 1 0 5 1 1 1 1 1 1 1 1 1 1 0 1 2 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1 2 5 9 > mydata mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 NA 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag NA 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 NA 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive NA 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout NA 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.4 8 NA 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 NA 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.2 8 304.0 150 3.15 NA 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 NA 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 |
2.缺失值的处理
2.1 删除缺失值多的变量
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | #删除第一列 mydata_1 <- mydata[,-1] #逗号前表示取所有的行,减号代表删除 mydata_1结果:> #删除第一列 > mydata_1 <- mydata[,-1] #逗号前表示取所有的行,减号代表删除 > mydata_1 cyl disp hp drat wt qsec vs am gear carb Mazda RX4 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 8 NA 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 4 71.1 65 4.22 NA 19.90 1 1 4 1 Toyota Corona 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 8 304.0 150 3.15 NA 17.30 0 0 3 2 Camaro Z28 8 350.0 NA 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 4 121.0 109 4.11 2.780 18.60 1 1 4 2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #删除缺失值的行 mydata_2 <- na.omit (mydata_1) mydata_2结果:> #删除缺失值的行 > mydata_2 <- na.omit (mydata_1) > mydata_2 cyl disp hp drat wt qsec vs am gear carb Mazda RX4 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Lincoln Continental 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corona 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Pontiac Firebird 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 4 121.0 109 4.11 2.780 18.60 1 1 4 2 |
complete.cases(mydata_1)
mydata_2_1 <- mydata_1[complete.cases(mydata_1),]
mydata_2_1
也可用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | complete.cases (mydata_1) mydata_2_1 <- mydata_1[ complete.cases (mydata_1),] mydata_2_1结果:> complete.cases (mydata_1) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE [17] TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > mydata_2_1 <- mydata_1[ complete.cases (mydata_1),] > mydata_2_1 cyl disp hp drat wt qsec vs am gear carb Mazda RX4 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Lincoln Continental 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corona 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Pontiac Firebird 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 4 121.0 109 4.11 2.780 18.60 1 1 4 2 |
2.2 均值插补缺失值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #查看缺失值的位置 which ( is.na (mydata),arr.ind = T) which ( is.na (mydata)) #求某一列剔除缺失值的均值 mydata[ is.na (mydata[,1]),1] <- mean (mydata[,1],na.rm = T) mydata结果:> #查看缺失值的位置 > which ( is.na (mydata),arr.ind = T) row col Mazda RX4 1 1 Mazda RX4 Wag 2 1 Datsun 710 3 1 Hornet 4 Drive 4 1 Hornet Sportabout 5 1 Cadillac Fleetwood 15 3 Camaro Z28 24 4 Toyota Corolla 20 6 AMC Javelin 23 6 > which ( is.na (mydata)) [1] 1 2 3 4 5 79 120 180 183 > #求某一列剔除缺失值的均值 > mydata[ is.na (mydata[,1]),1] <- mean (mydata[,1],na.rm = T) > mydata mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 19.92593 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 19.92593 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 19.92593 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 19.92593 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 19.92593 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.10000 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.30000 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.40000 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.80000 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 19.20000 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.80000 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 16.40000 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.30000 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.20000 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.40000 8 NA 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.40000 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.70000 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 32.40000 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.40000 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.90000 4 71.1 65 4.22 NA 19.90 1 1 4 1 Toyota Corona 21.50000 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 15.50000 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.20000 8 304.0 150 3.15 NA 17.30 0 0 3 2 Camaro Z28 13.30000 8 350.0 NA 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.20000 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 27.30000 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.00000 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.40000 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 15.80000 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.70000 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.00000 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 21.40000 4 121.0 109 4.11 2.780 18.60 1 1 4 2 |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· DeepSeek 开源周回顾「GitHub 热点速览」
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了