R中的管道符-magrittr

A Forward-Pipe Operator for R

1.基本用法

x %>% f 等价于 f(x)
x %>% f(y) 等价于 f(x, y)
x %>% f %>% g %>% h  等价于 h(g(f(x)))
----------------------------------
x %>% f(y, .) 等价于 f(y, x)
x %>% f(y, z = .) 等价于 f(y, z = x)

如果数据作为函数的第一个参数的话，是不需要占位符placeholder的，但是如果是第二或者更后面的参数，就需要使用 . 作为占位符。

2.构建函数

f <- . %>% cos %>% sin 
等价为
f <- function(.) sin(cos(.))

3.%>%, %$%和%<>%的用法

%>%是最常用的一种前向管道符。

%<>%是表示将变量传入函数，得到的结果返回给原来的变量

iris %<>% na.omit() 等价于 iris<-na.omit(iris)

%$%是使用变量的name来表示该变量下name的数值

df<-data.frame("a"=c(1,2,3), "b"=c(4,5,6))

cor(df$a, df$b) 等价于 df %$% cor(a,b)

Overview

The magrittr package offers a set of operators which make your code more readable by:
magrittr工具箱提供了一系列运算符可以使代码可读性提升。

structuring sequences of data operations left-to-right (as opposed to from the inside and out),通过从左到右代替从内到外，格式化数据操作序列

avoiding nested function calls, 避免函数嵌套

minimizing the need for local variables and function definitions, and 减少局部变量和函数定义的数量

making it easy to add steps anywhere in the sequence of operations. 在操作中便于添加步骤

The operators pipe their left-hand side values forward into expressions that appear on the right-hand side, i.e. one can replace f(x) with x %>% f(), where %>% is the (main) pipe-operator. When coupling several function calls with the pipe-operator, the benefit will become more apparent. Consider this pseudo example:

the_data <-
  read.csv('/path/to/data/file.csv') %>%
  subset(variable_a > x) %>%
  transform(variable_c = variable_a/variable_b) %>%
  head(100)

Four operations are performed to arrive at the desired data set, and they are written in a natural order: the same as the order of execution. Also, no temporary variables are needed. If yet another operation is required, it is straightforward to add to the sequence of operations wherever it may be needed.

If you are new to magrittr, the best place to start is the pipes chapter in R for data science.
Basic piping
x %>% f is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
x %>% f %>% g %>% h is equivalent to h(g(f(x)))
Here, “equivalent” is not technically exact: evaluation is non-standard, and the left-hand side is evaluated before passed on to the right-hand side expression. However, in most cases this has no practical implication.

The argument placeholder

x %>% f(y, .) is equivalent to f(y, x)
x %>% f(y, z = .) is equivalent to f(y, z = x)
Re-using the placeholder for attributes
It is straightforward to use the placeholder several times in a right-hand side expression. However, when the placeholder only appears in a nested expressions magrittr will still apply the first-argument rule. The reason is that in most cases this results more clean code.

x %>% f(y = nrow(.), z = ncol(.)) is equivalent to f(x, y = nrow(x), z = ncol(x))

The behavior can be overruled by enclosing the right-hand side in braces:

x %>% {f(y = nrow(.), z = ncol(.))} is equivalent to f(y = nrow(x), z = ncol(x))

Building (unary) functions

Any pipeline starting with the . will return a function which can later be used to apply the pipeline to values. Building functions in magrittr is therefore similar to building other values.

f <- . %>% cos %>% sin 
# is equivalent to 
f <- function(.) sin(cos(.))

Pipe with exposition of variables

Many functions accept a data argument, e.g. lm and aggregate, which is very useful in a pipeline where data is first processed and then passed into such a function. There are also functions that do not have a data argument, for which it is useful to expose the variables in the data. This is done with the %$% operator:

iris %>%
  subset(Sepal.Length > mean(Sepal.Length)) %$%
  cor(Sepal.Length, Sepal.Width)
#> [1] 0.3361992

data.frame(z = rnorm(100)) %$%
  ts.plot(z)

管道操作符（Pipe Operator）是一个特定的符号，它可以将前一行代码的输出传递给后一行代码作为输入，从而将原本相互独立的两行代码连接在一起。而通过不断地使用管道操作符，最终可以将多行代码写成“流”的形式。使用管道操作符既可以简化代码，又可以使代码间的逻辑关系更加清晰，还可以省去中间变量的输出。

R中的管道操作符包括%>%，%T>%，%<>%和%$%，分别实现不同功能，它们均来自于magrittr工具包。其中%>%作为R代码的必备工具，同时也为tidyverse系列的dplyr工具包所继承，因此我们可以通过加载这三个工具包中的任何一个来调用它，而另外三种管道操作符只能通过加载magrittr来调用，不过它们本身的应用场景也不及%>%丰富，使用频率相对较低。

1 %>%

如果一行代码需要输入的参数值刚好是它前一行的输出结果，可以使用%>%省略中间的输入过程。

比如我们要根据R自带的数据集mtcars，生成与它行数相同的一组随机数：

# 导入数据
data <- mtcars
# 计算行数
n <- dim(mtcars)[1]
# 生成随机数
rdn <- runif(n = n, min = 0, max = 100)

我们发现从第二行开始，它的输入参数就是它前一行的输出结果，因此可以使用%>%进行改写：

# load package
library(tidyverse) # 或者library(magrittr) 

mtcars %>%
  dim() %>%
  pluck(1) %>%
  runif(min = 0, max = 100) -> rdn

如果参数是位于第一的位置可以直接省略（大多数都是这种情况），其他位置的参数照常书写。比如上面dim()函数只需要第一个参数，且恰好是前一行的输出结果n，使用管道操作符后函数内就不需要再书写任何内容了；而runif()需要三个参数，其中第一个参数是前一行的输出结果，就只需要写其他两个参数。

另外这个过程不需要对中间变量进行命名，直到最后结果输出再进行命名。如果不使用管道操作符并且不对中间变量进行命名，也可以写出如下形式:

rdn <- runif(n = dim(mtcars)[1], min = 0, max = 100)
虽然以上代码也可以省去中间命名变量的麻烦，且看起来也很简洁，但是它的嵌套结构变得复杂了。实际使用时这种写法可以和%>%相结合，先把复杂的嵌套结构分解成若干较简单的嵌套结构再用%>%进行连接。

参数不是位于第一的位置，需要额外使用占位符

比如我们需要使用mtcars第一列变量的最小值控制生成随机数的最小值：

# 导入数据
data <- mtcars[,1]
# 计算最小值
min <- min(data)
# 生成随机数
rdn <- runif(n = 20, min = min, max = 100)

# 使用管道操作符
mtcars %>%
  pluck(1) %>%
  min() %>%
  runif(n = 20, ., max = 100) -> rdn
``
由于min是runif()第二个参数需要的输入值，因此需要在这个位置放个占位符.。如果不想指定参数位置，也可以通过声明参数名称来实现，即min = .。

当有多个参数的输入值依赖于前面代码的输出结果时，需要结合大括号{}进行使用

比如我们需要根据mtcars第一列变量的行数、最小值和最大值来控制随机数的生成：

不使用管道操作符

data <- mtcars[,1]
n <- length(data)
min <- min(data)
max <- max(data)
rdn <- runif(n = n, min = min, max = max)

# 使用管道操作符
mtcars %>%
  pluck(1) %>%
  {
    n = min(.)
    min = min(.)
    max = max(.)
    runif(n = n, min = min, max = max)
  } -> rdn

这种写法需要注意以下几点：

在{}内仍然需要对中间变量进行命名，但不会作为结果输出出来；
{}内的占位符.始终指代{}开始前的输出内容，该内容在{}即使位于第一个参数位置也不能被省略；
{}结束后的输出结果是{}内的最后一个完整语句的输出结果，其他语句的结果不会被输出。

2 %T>%

%T>%会接受前一行的输出结果，但不会把自己的输出结果传入下一行，如果下一行继续使用%>%进行参数传递，那么传递进去的参数仍然是%T>%前一行的输出结果。

比如我们想先观察mtcars第一列变量的分布状态后再决定runif()的参数：

data <- mtcars[1]
boxplot(data)
n <- length(data)
runif(n = n, min = 10, max = 35)

我们是在使用boxplot()观察data中数据的分布状态后再决定的限制生成随机数的最小值和最大值，而不是直接使用boxplot()的输出结果，也就是中间出现了“停顿”，这时就可以使用%T>%进行改写：

# 必须加载这个包
library(magrittr)

mtcars %>%
  pluck(1) %T>%
  boxplot() %>%
  length() %>%
  runif(min = 10, max = 35)

%T>%后的语句结果不能以“文本”的形式输出出来，因此后面接的通常是绘图、导出数据等操作，并且这项操作并不会影响后面的语句继续继承%T>%前面语句的输出结果作为参数。

3 %<>%

%<>%相比%>%额外的功能是它会在整段代码运行完后将运行结果直接返回给%<>%前面的变量并保存下来，省去了再次命名的步骤。

比如我们对data进行一系列操作后结果仍然命名为data:

library(magrittr)

# 使用%>%
data <- mtcars
data %>%
  mutate(rdn = runif(n = dim(.)[1], 10, 20)) -> data
在对data进行操作后我们使用向右赋值符号->将数据仍然命名为data，如果使用%<>%可以省去这一步：

data <- mtcars
data %<>%
  mutate(rdn = runif(n = dim(.)[1], 10, 20))

需要注意以下几点：

%<>%必须接在一个变量后面，而管道操作符中途没有新的命名变量，实际上这决定了%<>%的位置必须紧接在第一行代码后，且只能使用一次；
处理前后的变量必须同名。

4 %$%

%$%传递的不是前一行的输出结果本身，而是输出数据框的列名，可以允许后一行代码直接根据列名调用相应的数据。

比如我们想根据mtcars中hp列的分布情况来控制生成随机数：

mtcars %$%
  runif(n = length(hp), min = min(hp), max = max(hp)) –> rdn

posted @ 2022-12-08 11:36 xinkevinzhang 阅读(707) 评论(0) 收藏举报

刷新页面返回顶部

xinkevinzhang