每日一R--match

match 

pmatch

intersect

%in%

setdiff

===================================================

match package:base R Documentation

Value Matching

Description:

'match' returns a vector of the positions of (first) matches of
its first argument in its second.

'%in%' is a more intuitive interface as a binary operator, which
returns a logical vector indicating if there is a match or not for
its left operand.

Usage:

match(x, table, nomatch = NA_integer_, incomparables = NULL)

 

x: 向量, 要匹配的值;

table: 向量, 被匹配的值;

nomatch: 没匹配上的返回值, 必须是整数;

incomparables: 指定不能用来匹配的值.

 

x %in% table

这个返回的是TRUE和FALSE

> rep(1, 3) %in% rep(1, 5)
[1] TRUE TRUE TRUE

match返回的是位置

> match(rep(1, 3), rep(1, 5))
[1] 1 1 1


Arguments:

x: vector or 'NULL': the values to be matched. Long vectors are
supported.

table: vector or 'NULL': the values to be matched against. Long
vectors are not supported.

nomatch: the value to be returned in the case when no match is found.
Note that it is coerced to 'integer'.

incomparables: a vector of values that cannot be matched. Any value in
'x' matching a value in this vector is assigned the 'nomatch'
value. For historical reasons, 'FALSE' is equivalent to
'NULL'.

Details:

'%in%' is currently defined as
'"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'

原来这个函数是这样定义的

 

> "%in%" <- function(x, table) match(x, table, nomatch = 0) 
> 1:10 %in% c(1,3,5,9)
 [1] 1 0 2 0 3 0 0 0 4 0
> "%in%" <- function(x, table) match(x, table, nomatch = 0)>0
> 1:10 %in% c(1,3,5,9)
 [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE

 

左边的值在右边的位置

 

 

Factors, raw vectors and lists are converted to character vectors,
and then 'x' and 'table' are coerced to a common type (the later
of the two types in R's ordering, logical < integer < numeric <
complex < character) before matching. If 'incomparables' has
positive length it is coerced to the common type.

Matching for lists is potentially very slow and best avoided
except in simple cases.

Exactly what matches what is to some extent a matter of
definition. For all types, 'NA' matches 'NA' and no other value.
For real and complex values, 'NaN' values are regarded as matching
any other 'NaN' value, but not matching 'NA'.

That '%in%' never returns 'NA' makes it particularly useful in
'if' conditions.

Character strings will be compared as byte sequences if any input
is marked as '"bytes"' (see 'Encoding').

Value:

A vector of the same length as 'x'.

'match': An integer vector giving the position in 'table' of the
first match if there is a match, otherwise 'nomatch'.

If 'x[i]' is found to equal 'table[j]' then the value returned in
the 'i'-th position of the return value is 'j', for the smallest
possible 'j'. If no match is found, the value is 'nomatch'.

'%in%': A logical vector, indicating if a match was located for
each element of 'x': thus the values are 'TRUE' or 'FALSE' and
never 'NA'.

References:

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
Language_. Wadsworth & Brooks/Cole.

See Also:

'pmatch' and 'charmatch' for (_partial_) string matching,
'match.arg', etc for function argument matching. 'findInterval'
similarly returns a vector of positions, but finds numbers within
intervals, rather than exact matches.

'is.element' for an S-compatible equivalent of '%in%'.

Examples:

## The intersection of two sets can be defined via match():
## Simple version:


## intersect <- function(x, y) y[match(x, y, nomatch = 0)]
intersect # the R function in base is slightly more careful
intersect(1:10, 7:20)

1:10 %in% c(1,3,5,9)
sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%")
sstr[sstr %in% c(letters, LETTERS)]

> sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%")
> sstr[sstr %in% c(letters, LETTERS)]
[1] "c" "B" "c" "a"

 

c(letters, LETTERS)
大小写字母这么表示


"%w/o%" <- function(x, y) x[!x %in% y] #-- x without y
(1:10) %w/o% c(3,7,12)

 

## Note that setdiff() is very similar and typically makes more sense:
c(1:6,7:2) %w/o% c(3,7,12) # -> keeps duplicates

 


setdiff(c(1:6,7:2), c(3,7,12)) # -> unique values

> setdiff(c(1:6,7:2), c(3,7,12)) 
[1] 1 2 4 5 6

setdiff是集合

 

#=====================================================

> ?pmatch
> pmatch("", "") # returns NA
[1] NA
> pmatch("m", c("mean", "median", "mode")) # returns NA [1] NA 
#因为不是完全匹配,也不是唯一匹配
> pmatch("med", c("mean", "median", "mode")) # returns 2 [1] 2
#匹配上多个返回NA > > pmatch(c("", "ab", "ab"), c("abc", "ab"), dup = FALSE) [1] NA 2 1
#“”没匹配上,去掉;“ab”匹配上2,去掉x和table该位置的ab,“ab”不完全匹配上“abc”,返回第一个位置;
感觉这个用的不多
> pmatch(c("", "ab", "ab"), c("abc", "ab"), dup = TRUE) [1] NA 2 2 > ## compare > charmatch(c("", "ab", "ab"), c("abc", "ab")) [1] 0 2 2

pmatch函数是一个部分匹配函数, 依次从x里面挑出元素, 对照table进行匹配, 若匹配上则剔除匹配上的值, 不再参与下次匹配, duplicate.ok可设置是否剔除; 对于某一个元素, 匹配一共分成三步:

1. 如果可以完全匹配, 则认为匹配上了, 返回table中的位置;
2. 不满足上述条件, 如果是唯一部分匹配, 则返回table中的位置;
3. 不满足上述条件, 则认为没有值与其匹配上.

#===========================================================================

 

 

本文引用至

Rbase Documentation

http://blog.sina.com.cn/s/blog_73206f7b0102vyox.html

 

posted on 2017-11-08 11:00  chenwt  阅读(1655)  评论(0编辑  收藏  举报