with(), within() 和 transform()的简单比较

with(), within()transform()的简单比较

在R中,初次了解学习data.frame时,你会发现,为构造一个列向量,你需要多次重复输入数据框的名称,如下所示:

library(MASS)
anorexia$wtDiff <- anorexia$Postwt - anorexia$Prewt #多次重复键入数据框名称

事实上,无论何时,当你看到一遍又一遍重复的代码块时,你都需要思考,这块代码是否需要重写,是否有更为简洁的代码可取而代之?因此,当你为遇到上述情况而苦恼烦躁时,偶然间发现attach()函数,幸福感一定会油然而生!但是应用attach()函数之后,你会发现该函数的副作用太过令人烦躁头疼,即常常需要花更多的时间进行debugg。

attach(anorexia)
anorexia$wtDiff <- Postwt - Prew   #变量名称的书写错误
detach(anorexia)

在上面的代码片段中,变量名称的书写错误会导致第二行代码及之后的代码都无法运行,即detach()函数未被执行。随后,在修复书写错误之后,重新运行代码,此时anorexia在搜索路径中有两次。此时,存在的问题是,detach()函数仅运行一次,这会导致搜索路径中还有anorexia,这很容易造成后面数据之间覆盖之类的问题,而且此问题有时候比较难以发现。

根据上述问题,本文着重介绍with(), within()transform()。这三个函数均能方便对数据框进行操作。例如,添加或覆盖某一列向量至数据框中。

# 数据框的单个改动,with()的代码相对简洁
anorexia$wtDiff <- with(anorexia, Postwt - Prewt)
anorexia <- within(anorexia, wtDiff2 <- Postwt - Prewt)
anorexia <- transform(anorexia, wtDiff3 = Postwt - Prewt)

# 数据框的多个改动,with()的代码相对冗长繁琐,推荐使用 within() 和 transform()
fahrenheit_to_celcius <- function(f) (f - 32) / 1.8
  
airquality[c("cTemp", "logOzone", "MonthName")] <- with(airquality, list(
  fahrenheit_to_celcius(Temp),
  log(Ozone),
  month.abb[Month]
))
  
airquality <- within(airquality,
{
  cTemp2     <- fahrenheit_to_celcius(Temp)
  logOzone2  <- log(Ozone)
  MonthName2 <- month.abb[Month]
})
  
airquality <- transform(airquality,
  cTemp3     = fahrenheit_to_celcius(Temp),
  logOzone3  = log(Ozone),
  MonthName3 = month.abb[Month]
)

with()within() 的简单比较

用法:使用 list 或 data frame中items (variables) 评估执行R表达式。

with(data, expr, ...)
within(data, expr, ...)

# data	常用的数据格式有list 或 data frame。但是对于with()函数,它还可以是 an environment 或 an integer as in sys.call
# expr	使用数据内容进行评估执行的一个或多个表达式。注意,如果有多个表达式,则需要用花括号括起来。

?sys.call  ### Functions to Access the Function Call Stack(访问函数调用堆栈的函数)

with()函数是一个泛型函数,由数据构建的本地环境,评估执行R的表达式(命令)。环境将调用者的环境作为其父环境。这对于简化调用建模函数非常有用。(注意:如果data本身就是an environment,则它与它已存在的父环境一起使用)

注意,expr仅在构建的环境中工作,而不是当前用户的工作空间(workspace)

within()函数与with()函数类似,两者的区别在于,within()函数在评估执行R的表达式(命令)之后检查环境,并对数据的副本(a copy of data)做出相应的更改,然后再返回带有这些更改内容的新对象。 within()函数类似 transform的另一种形式。

返回值:

with()函数: 返回评估执行R表达式的值

within()函数: 返回修改对象

> install.packages("openintro")
> library(openintro)
> data(marioKart)
> names(marioKart)
 [1] "ID"         "duration"   "nBids"      "cond"      
 [5] "startPr"    "shipPr"     "totalPr"    "shipSp"    
 [9] "sellerRate" "stockPhoto" "wheels"     "title"     
> dim(marioKart)
[1] 143  12
> 
> #删除两个异常值
> mk0 <- marioKart[marioKart$totalPr < 100,]
> 
> 
> #创建图形
> with(mk0, {
+            boxplot(totalPr ~ wheels)
+            points(wheels+1.1, totalPr, col=4)
+           })
> 
> 
> #删除一个列向量
> mk2 <- within(mk0, rm(title))
> names(mk2)
 [1] "ID"         "duration"   "nBids"      "cond"      
 [5] "startPr"    "shipPr"     "totalPr"    "shipSp"    
 [9] "sellerRate" "stockPhoto" "wheels"    
> 
> 
> #更改值
> mk0$totalPr[50]
[1] 59.88
> mk0$startPr[25]
[1] 0.01
> mk3 <- within(mk0, { # Would not typically do...
+                      # this is just an example
+                     totalPr[50] <- 88.59
+                     startPr[25] <- 85.00
+                    })
> mk3$totalPr[50]
[1] 88.59
> mk3$startPr[25]
[1] 85
> 
> 
> #创建一个列向量
> mk4 <- within(mk0, endPrice <- totalPr - shipPr)
> all.equal(mk4$totalPr - mk4$shipPr, mk4$endPrice)
[1] TRUE
> names(mk4)
 [1] "ID"         "duration"   "nBids"      "cond"      
 [5] "startPr"    "shipPr"     "totalPr"    "shipSp"    
 [9] "sellerRate" "stockPhoto" "wheels"     "title"     
[13] "endPrice" 

附:

with(data, expr, ...) --->> Evaluate an R expression in an environment constructed from data, possibly modifying (a copy of) the original data.

within(data, expr, ...) --->> Evaluate an R expression in an environment constructed from data, possibly modifying (a copy of) the original data.

transform(`_data`, ...) --->> transform is a generic function, which—at least currently—only does anything useful with data frames. transform.default converts its first argument to a data frame if possible and calls transform.data.frame.
head(mtcars)
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

mtcars$mpg[mtcars$cyl == 8  &  mtcars$disp > 350]
#更简洁的书写
with(mtcars, mpg[cyl == 8  &  disp > 350])
# [1] 18.7 14.3 10.4 10.4 14.7 19.2 15.8


# examples from glm:
with(data.frame(u = c(5,10,15,20,30,40,60,80,100),
                lot1 = c(118,58,42,35,27,25,21,19,18),
                lot2 = c(69,35,26,21,18,16,13,12,12)),
    list(summary(glm(lot1 ~ log(u), family = Gamma)),
         summary(glm(lot2 ~ log(u), family = Gamma))))


head(airquality)
#   Ozone Solar.R Wind Temp Month Day
# 1    41     190  7.4   67     5   1
# 2    36     118  8.0   72     5   2
# 3    12     149 12.6   74     5   3
# 4    18     313 11.5   62     5   4
# 5    NA      NA 14.3   56     5   5
# 6    28      NA 14.9   66     5   6

aq <- within(airquality, {     # 可更改多个变量
    lOzone <- log(Ozone)
    Month <- factor(month.abb[Month])
    cTemp <- round((Temp - 32) * 5/9, 1) # 将华氏温度转变为摄氏度
    S.cT <- Solar.R / cTemp  # 使用新创建的变量
    rm(Day, Temp)
})
head(aq)
#   Ozone Solar.R Wind Month      S.cT cTemp   lOzone
# 1    41     190  7.4   May  9.793814  19.4 3.713572
# 2    36     118  8.0   May  5.315315  22.2 3.583519
# 3    12     149 12.6   May  6.394850  23.3 2.484907
# 4    18     313 11.5   May 18.742515  16.7 2.890372
# 5    NA      NA 14.3   May        NA  13.3       NA
# 6    28      NA 14.9   May        NA  18.9 3.332205

# example from boxplot:
head(ToothGrowth)
#    len supp dose
# 1  4.2   VC  0.5
# 2 11.5   VC  0.5
# 3  7.3   VC  0.5
# 4  5.8   VC  0.5
# 5  6.4   VC  0.5
# 6 10.0   VC  0.5

with(ToothGrowth, {
    boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2,
            subset = (supp == "VC"), col = "yellow",
            main = "Guinea Pigs' Tooth Growth",
            xlab = "Vitamin C dose mg",
            ylab = "tooth length", ylim = c(0, 35))
    boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2,
            subset = supp == "OJ", col = "orange")
    legend(2, 9, c("Ascorbic acid", "Orange juice"),
           fill = c("yellow", "orange"))
})


# 避免子集参数的另一种形式:
with(subset(ToothGrowth, supp == "VC"),
     boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2,
             col = "yellow", main = "Guinea Pigs' Tooth Growth",
             xlab = "Vitamin C dose mg",
             ylab = "tooth length", ylim = c(0, 35)))
with(subset(ToothGrowth,  supp == "OJ"),
     boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2,
             col = "orange"))
legend(2, 9, c("Ascorbic acid", "Orange juice"),
       fill = c("yellow", "orange"))

link1: http://rfunction.com/archives/2182
link2: https://www.r-bloggers.com/friday-function-triple-bill-with-vs-within-vs-transform/

posted @ 2018-03-09 15:46  AdaWongCorner  阅读(592)  评论(0编辑  收藏  举报