R: NULL, NA, and NaN

  • NaN (“Not a Number”) means 0/0
  • NA (“Not Available”) is generally interpreted as a missing value and has various forms – NA_integer_, NA_real_, etc. 
  • Therefore, NaN ≠ NA and there is a need for NaN and NA.
  • is.na() returns TRUE for both NA and NaN, however is.nan() return TRUE for NaN (0/0) and FALSE for NA.
  • NULL represents that the value in question simply does not exist, rather than being existent but unknown.

is.na(x) # returns TRUE of x is missing
y <- c(1,2,3,NA)
is.na(y) # returns a vector (F F F T)

 

x <- c(1,2,NA,3)
mean(x) # returns NA
mean(x, na.rm=TRUE) # returns 2

 

The function na.omit() returns the object with listwise deletion of missing values.

# create new dataset without missing data
newdata <- na.omit(mydata)

 

They are not supposed to give the same result. Consider this example:

exdf<-data.frame(a=c(1,NA,5),b=c(3,2,2))
#   a b
#1  1 3
#2 NA 2
#3  5 2
colMeans(exdf,na.rm=TRUE) ## remove only "NA"
#       a        b 
#3.000000 2.333333
colMeans(na.omit(exdf)) ## remove "NA 2"
#  a   b 
#3.0 2.5

Why is this? In the first case, the mean of column b is calculated through (3+2+2)/3. In the second case, the second row is removed in its entirety (also the value of b which is not-NA and therefore considered in the first case) by na.omit and so the b mean is just (3+2)/2.

 

  • REF:
  • http://www.cookbook-r.com/Basics/Working_with_NULL_NA_and_NaN/
  • http://stackoverflow.com/questions/7031127/data-frames-and-is-nan
  • http://www.r-bloggers.com/difference-between-na-and-nan-in-r/
  • http://help.scilab.org/docs/5.5.2/en_US/isnan.html
  • http://www.quantlego.com/howto/special-missing-values-in-r/
posted @ 2015-12-19 20:21  emanlee  阅读(983)  评论(0编辑  收藏  举报