R函数-字符串操作

字符串操作
#++++++++++++++++++++++++++++++++++++++++++
#++++++++++++++++++++++++++++++++++++++++++
#R语言+++++++++字符串处理函数+++++++++
#内容概览：
#尽管R是一门数值向量和矩阵为核心的统计语言，但字符串同样极为重要。
#从医疗研究数据里的出生日期到文本挖掘的应用，字符串数据在R程序中的使用频率非常高。
#++++++++++++++++++++++++++++++++++++++++++
#++++++++++++++++++++++++++++++++++++++++++
#------------------------
#字符型向量
character
#字符数
nchar
#取子串
substr
#把对象用格式转换为字符串
format，formatC
#连接或拆分
paste，strsplit
#字符串匹配
charmatch，pmatch
#模式匹配与替换
grep，sub，gsub
#------------------------
#字符串连接的函数：
#paste(..., sep = " ", collapse = NULL)
#paste()函数用于字符串连接，其中sep负责两组字符串间的连接；collapse负责一组字符串内部的连接。
paste()   
#------------------------
#字符串分割的函数：
#strsplit(x, split, extended = TRUE, fixed = FALSE, perl = FALSE)
strsplit()   
#strsplit()函数用于字符串分割，其中split是分割参数。所得结果以默认list形式展示。
myresult <- strsplit('123abcdefgabcdef','ab')
#myresult <- strsplit('123abcdefgabcdef',split='ab')
myresult
#[[1]]
#[1] "123"   "cdefg" "cdef"
class(myresult) 
#------------------------
#计算字符串的字符数：
nchar()
#nchar()返回字符串的长度。
nchar("abc")
nchar(NA)                 #缺失值长度
nchar(Inf)                #无限长度值
nchar(NULL)               #NULL情况的结果
nchar("")                 # "" 这种情况结果
length(nchar(""))         # "" 长度是有值的，就是0
length(nchar(NULL))       # NULL长度是没有值的
#------------------------
#字符串截取：
substr(x, start, stop)                   
substring(text, first, last = 1000000)   
#substr()函数和substring()函数是截取字符串最常用的函数，两个函数功能方面是一样的，只是其中参数设置不同。
#substr()函数：必须设置参数start和stop，如果缺少将出错。
#substring()函数：可以只设置first参数，last参数若不设置，则默认为1000000L，通常是指字符串的最大长度。
substr(x, start, stop) <- value                   #通过value值替换x中的部分
substring(text, first, last = 1000000) <- value   #通过value值替换x中的部分
###########例子说明
substr("abcdef",2,4)
substring("abcdef",1:6,1:6)   # strsplit is more efficient ...
substr(rep("abcdef",4),1:4,4:5)
x <- c("asfef", "qwerty", yuiop[", "b", "stuff.blah.yech")
substr(x, 2, 5)
substring(x, 2, 4:6)
substring(x, 2) <- c("..", "+++")
x
#------------------------
#字符串替换函数：
chartr(old, new, x)
#------------------------
#大小写转换函数：
tolower(x)
toupper(x)
casefold(x, upper = FALSE)
#------------------------
#字符完全匹配
grep()
#字符不完全匹配
agrep()
#字符替换
gsub()
#以上这些函数均可以通过perl=TRUE来使用正则表达式。
grep(pattern, x, ignore.case = FALSE, extended = TRUE,
     perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE)

sub(pattern, replacement, x,
    ignore.case = FALSE, extended = TRUE, perl = FALSE,
    fixed = FALSE, useBytes = FALSE)

gsub(pattern, replacement, x,
    ignore.case = FALSE, extended = TRUE, perl = FALSE,
    fixed = FALSE, useBytes = FALSE)

regexpr(pattern, text, ignore.case = FALSE, extended = TRUE,
    perl = FALSE, fixed = FALSE, useBytes = FALSE)

gregexpr(pattern, text, ignore.case = FALSE, extended = TRUE,
    perl = FALSE, fixed = FALSE, useBytes = FALSE)
#------------------------
#See Also:
#     regular expression (aka 'regexp') for the details of the pattern specification.
#     'glob2rx' to turn wildcard matches into regular expressions.
#     'agrep' for approximate matching.
#     'tolower', 'toupper' and 'chartr' for character translations.
#     'charmatch', 'pmatch', 'match'. 'apropos' uses regexps and has nice examples.
#------------------------
posted @ 2016-01-25 23:39 银河统计阅读(409) 评论(0) 收藏举报
银河统计

哈尔滨商业大学银河统计工作室

R函数-字符串操作