R: NULL, NA, and NaN
- NaN (“Not a Number”) means 0/0
- NA (“Not Available”) is generally interpreted as a missing value and has various forms – NA_integer_, NA_real_, etc.
- Therefore, NaN ≠ NA and there is a need for NaN and NA.
- is.na() returns TRUE for both NA and NaN, however is.nan() return TRUE for NaN (0/0) and FALSE for NA.
- NULL represents that the value in question simply does not exist, rather than being existent but unknown.
is.na(x) # returns TRUE of x is missing
y <- c(1,2,3,NA)
is.na(y) # returns a vector (F F F T)
x <- c(1,2,NA,3)
mean(x) # returns NA
mean(x, na.rm=TRUE) # returns 2
The function na.omit() returns the object with listwise deletion of missing values.
# create new dataset without missing data
newdata <- na.omit(mydata)
They are not supposed to give the same result. Consider this example:
exdf<-data.frame(a=c(1,NA,5),b=c(3,2,2))
# a b
#1 1 3
#2 NA 2
#3 5 2
colMeans(exdf,na.rm=TRUE) ## remove only "NA"
# a b
#3.000000 2.333333
colMeans(na.omit(exdf)) ## remove "NA 2"
# a b
#3.0 2.5
Why is this? In the first case, the mean of column b
is calculated through (3+2+2)/3
. In the second case, the second row is removed in its entirety (also the value of b
which is not-NA and therefore considered in the first case) by na.omit
and so the b
mean is just (3+2)/2
.
- REF:
- http://www.cookbook-r.com/Basics/Working_with_NULL_NA_and_NaN/
- http://stackoverflow.com/questions/7031127/data-frames-and-is-nan
- http://www.r-bloggers.com/difference-between-na-and-nan-in-r/
- http://help.scilab.org/docs/5.5.2/en_US/isnan.html
- http://www.quantlego.com/howto/special-missing-values-in-r/