R 字符处理基础函数
1、nchar(x):返回字符串或者字符串向量x的长度。
> nchar("I love you!")
[1] 11
> nchar(c("I", "love", "you", "!"))
[1] 1 4 3 1
2、grep(pattern,x):返回 pattern 在字符串向量 x 中的位置。
> grep("y", "I love you!")
[1] 1
> a <- c("I", "love", "you", "!")
> grep("y", a)
[1] 3
> grep("k", a)
integer(0)
3、paste(...,sep=" "):连接字符串,分隔符为 sep (默认值为空格)。
> paste("I", "love", "you", "!")
[1] "I love you !"
> a <- c("I", "love", "you", "!")
> a
[1] "I" "love" "you" "!"
> paste(a, 1:4)
[1] "I 1" "love 2" "you 3" "! 4"
> paste(a, 1:4, sep="-")
[1] "I-1" "love-2" "you-3" "!-4"
> paste("Today is","Sat Jan 11 2020")
[1] "Today is Sat Jan 11 2020"
4、paste0(...,sep=" "):以空字符串连接字符。
> paste0("I", "love", "you", "!")
[1] "Iloveyou!"
> a <- c("I", "love", "you", "!")
> a
[1] "I" "love" "you" "!"
> paste0(a, 1:4)
[1] "I1" "love2" "you3" "!4"
> paste0(a, 1:4, sep="--")
[1] "I1--" "love2--" "you3--" "!4--"
> b <- c("甲","乙","丙","丁","戊","己","庚","辛","壬","癸")
> d <- c("子","丑","寅","卯","辰","巳","午","未","申","酉","戌","亥")
> paste0(b, d)
[1] "甲子" "乙丑" "丙寅" "丁卯" "戊辰" "己巳" "庚午"
[8] "辛未" "壬申" "癸酉" "甲戌" "乙亥"
5、sprintf(...):按照一定格式把若干的组件组合成字符串。
> a <- 11
> sprintf("The square of %d is %d", a, a^2)
[1] "The square of 11 is 121"
> sprintf("The square root of %d is %d", a^2, (a^2)^0.5)
[1] "The square root of 121 is 11"
相似于 Python 中的打印格式化字符串
示例:
a = 11
print('The square of %d is %d' % (a, a**2))
print('The square root of {} is {}'.format(a**2, a))
The square of 11 is 121
The square root of 121 is 11
6、substr(x,start,stop):截取字符串x中start到stop范围的字串。
excel 中的 mid(), python 中的 切片
示例:
> a <- paste0(letters[1:7], collapse="")
> a
[1] "abcdefg"
> substr(a, 1, 3)
[1] "abc"
> substr(a, 1, 3) <- "aaa"
> a
[1] "aaadefg"
> b <- c("1a","2bb", "3ccc", "4dddd" )
> substr(b, 1, 2)
[1] "1a" "2b" "3c" "4d"
7、strsplit(x,split):根据split将x拆分成若干字串,返回这些字串组成的列表。
python 中的 s.split(split)
示例:
> a <-paste(letters[1:7], collapse="_")
> a
[1] "a_b_c_d_e_f_g"
> strsplit(a, "_")
[[1]]
[1] "a" "b" "c" "d" "e" "f" "g"
> b <- paste0(letters[1:7], 1:7, collapse="_")
> b
[1] "a1_b2_c3_d4_e5_f6_g7"
> strsplit(b, "_")
[[1]]
[1] "a1" "b2" "c3" "d4" "e5" "f6" "g7"
> d <- paste0(c(2020, 01, 10), collapse="/")
> d
[1] "2020/1/10"
> strsplit(d, "/")
[[1]]
[1] "2020" "1" "10"
> # 将列表转换为字符串向量
> unlist(strsplit(d, "/"))
[1] "2020" "1" "10"
8、regexpr(pattern,x):在字符串 x 中寻找 pattern,返回与pattern匹配的第一个子字符串的起始字符位置。
> a <- "I love you!"
> regexpr("y", a)
[1] 8
attr(,"match.length")
[1] 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
“y” 在 a 的第八个位置开始,并且长度为1。
9、gregexpr(pattern,x):查找x中的所有与pattern匹配的字串开始位置及长度。
> a <- "I love you!"
> b <- "You love me!"
> paste(a, b)
[1] "I love you! You love me!"
> gregexpr("v", paste(a, b))
[[1]]
[1] 5 19
attr(,"match.length")
[1] 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
"v" 在 paste(a, b) 中出现了两次。
推荐阅读:
http://blog.sina.com.cn/s/blog_69ffa1f90101sie9.html
https://www.cnblogs.com/awishfullyway/p/6601539.html
https://blog.csdn.net/yj1556492839/article/details/82725315
非学无以广才,非志无以成学。