R 学习笔记《六》 R语言初学者指南--访问变量、处理数据子集


1 访问数据框变量


[1] "Sample"   "Year"     "Month"    "Location" "Sex"      "GSI"     


1.1 str函数


'data.frame':   2644 obs. of  6 variables:
 $ Sample  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Year    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Month   : int  1 1 1 1 1 1 1 1 1 2 ...
 $ Location: int  1 3 1 1 1 1 1 3 3 1 ...
 $ Sex     : int  2 2 2 2 2 2 2 2 2 2 ...
 $ GSI     : num  10.44 9.83 9.74 9.31 8.99 ...


Sample  ,Yead,Month,Location,Sex这几个变量是整型



错误: 找不到对象'GSI'

1.2 函数中的数据参数--访问数据框中的变量的最佳方式

M1 <- lm(GSI ~ factor(Location)+factor(Year),data = Squid)

lm(formula = GSI ~ factor(Location) + factor(Year), data = Squid)

      (Intercept)  factor(Location)2  factor(Location)3  factor(Location)4  
           1.3939            -2.2178            -0.1417             0.3138  
    factor(Year)2      factor(Year)3      factor(Year)4  
           1.3548             0.9564             1.2270  


lm 是做线性回归的函数,data = Squid表示从数据框Squid中取变量

data = 并不是适用于任何函数,eg:

mean(GSI,data = Squid) 
错误于mean(GSI, data = Squid) : 找不到对象'GSI'

1.3 $ 符号 访问变量的另外一种方法


[1] 10.4432  9.8331  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045
[9]  7.2156  6.8372  6.3882  6.3672  6.2998  6.0726  5.8395  5.8070
[17]  5.7774  5.7757  5.6484  5.6141  5.6017  5.5510  5.3110  5.2970
[25]  5.2253  5.1667  5.1405  5.1292  5.0782  5.0612  5.0097  4.9745



[1] 10.4432  9.8331  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045
[9]  7.2156  6.8372  6.3882  6.3672  6.2998  6.0726  5.8395  5.8070
[17]  5.7774  5.7757  5.6484  5.6141  5.6017  5.5510  5.3110  5.2970
[25]  5.2253  5.1667  5.1405  5.1292  5.0782  5.0612  5.0097  4.9745



[1] 2.187034 


1.4 attach 函数


[1] 10.4432  9.8331  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045
[9]  7.2156  6.8372  6.3882  6.3672  6.2998  6.0726  5.8395  5.8070
[17]  5.7774  5.7757  5.6484  5.6141  5.6017  5.5510  5.3110  5.2970
[25]  5.2253  5.1667  5.1405  5.1292  5.0782  5.0612  5.0097  4.9745










2 访问数据集



[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2
[36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1



[1] 2 1



Sel <- Squid$Sex == 1
SquidM <- Squid[Sel,]
     Sample Year Month Location Sex    GSI
24       24    1     5        1   1 5.2970
48       48    1     5        3   1 4.2968
58       58    1     6        1   1 3.5008
60       60    1     6        1   1 3.2487
61       61    1     6        1   1 3.2304

Sel <- Squid$Sex == 1这条命令生成一个向量与Sex具有相同的长度,如果Sex的值等于1则该变量的值为TRUE,否则为FALSE,这样一个变量可称为布尔变量,可以用来选择行。

SquidM <- Squid[Sel,]这条命令表示选择Squid中Sel等于TRUE的行,并将数据存储到SquidM中。因为是选择行,所以需要使用方阔号。


go  on  


SquidF <- Squid[Squid$Sex == 2,]
     Sample Year Month Location Sex     GSI
1         1    1     1        1   2 10.4432
2         2    1     1        3   2  9.8331
3         3    1     1        1   2  9.7356
4         4    1     1        1   2  9.3107
5         5    1     1        1   2  8.9926


Squid123 <- Squid[Squid$Location == 1 | Squid$Location ==2 | Squid$Location == 3,]
Squid123 <- Squid[Squid$Location != 4,]
Squid123 <- Squid[Squid$Location < 4 ,]
Squid123 <- Squid[Squid$Location <=3 ,]
Squid123 <- Squid[Squid$Location >=1 &Squid$Location <=3 ,]



[1] 1 3 4 2
Squid123 <- Squid[Squid$Location == 1 | Squid$Location ==2 | Squid$Location == 3,]
     Sample Year Month Location Sex     GSI
1         1    1     1        1   2 10.4432
2         2    1     1        3   2  9.8331
3         3    1     1        1   2  9.7356
4         4    1     1        1   2  9.3107
5         5    1     1        1   2  8.9926
6         6    1     1        1   2  8.7707



SquidM.1 <- Squid[Squid$Sex == 1 & Squid$Location == 1,]
     Sample Year Month Location Sex    GSI
24       24    1     5        1   1 5.2970
58       58    1     6        1   1 3.5008
60       60    1     6        1   1 3.2487


SquidM.12 <- Squid[Squid$Sex == 1 &( Squid$Location == 1 | Squid$Location == 2),]
     Sample Year Month Location Sex    GSI
24       24    1     5        1   1 5.2970
58       58    1     6        1   1 3.5008
60       60    1     6        1   1 3.2487


SquidM1 <- SquidM[Squid$Location == 1,] 
        Sample Year Month Location Sex    GSI 
24          24    1     5        1   1 5.2970 
58          58    1     6        1   1 3.5008 



NA          NA   NA    NA       NA  NA     NA
NA.1        NA   NA    NA       NA  NA     NA
NA.2        NA   NA    NA       NA  NA     NA
NA.3        NA   NA    NA       NA  NA     NA
NA.4        NA   NA    NA       NA  NA     NA



之前得到的SquidM表示雄性数据,显然SquidM的行数与Squid$Location == 1 布尔向量的长度不一致。因此导出出现上面的现象。

2.1 数据排序

Ord1 <- order(Squid$Month)
     Sample Year Month Location Sex     GSI
1         1    1     1        1   2 10.4432
2         2    1     1        3   2  9.8331
3         3    1     1        1   2  9.7356
4         4    1     1        1   2  9.3107




[1] 10.4432  9.8331  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045
[9]  7.2156  6.3882  6.0726  5.7757  1.2610  1.1997  0.8373  0.6716
[17]  0.5758  0.5518  0.4921  0.4808  0.3828  0.3289  0.2758  0.2506
[25]  0.2092  0.1792  0.1661  0.1618  0.1543  0.1541  0.1490  0.1379

3 使用相同的标识符组合两个数据集

Sql1 <- read.table(file = "squid1.txt",header = TRUE)
Sql2 <- read.table(file = "squid2.txt",header = TRUE)
SquidMerged <- merge(Sql1,Sql2,by = "Sample")
     Sample     GSI YEAR MONTH Location Sex
1         1 10.4432    1     1        1   2
2         2  9.8331    1     1        3   2
3         3  9.7356    1     1        1   2
4         5  8.9926    1     1        1   2
5         6  8.7707    1     1        1   2
6         7  8.2576    1     1        1   2


merge 命令采用两个数据框Sql1 ,Sql2作为参数并使用变量Sample作为形同的标识符合并两个数据。merger函数还有一个选项是all,缺省状态值是FALSE:即如果Sql1或Sql2中的值有缺失,则将被忽略。如果all的值设置为TRUE,可能会产生NA值

Sql11 <- read.table(file = "squid1.txt",header = TRUE)
Sql21 <- read.table(file = "squid2.txt",header = TRUE)
SquidMerged1 <- merge(Sql11,Sql21,by = "Sample")



4 输出数据



write.table(SquidM,file = "MaleSquid_wujiahua.txt",sep = " ",quote =  FALSE,append = FALSE,na = "NA")




Sample Year Month Location Sex GSI
24 24 1 5 1 1 5.297
48 48 1 5 3 1 4.2968
58 58 1 6 1 1 3.5008
60 60 1 6 1 1 3.2487
61 61 1 6 1 1 3.2304



write.table第一个参数表示要输出的数据,第二参数是数据保存的文件名,sep = " " 宝成数据通过空格隔开,qoute=FALSE消除字符串的引号标识,na="NA"表示缺失值通过NA替换。append=TRUE表示把数据添加到文件的尾部

5 重新编码分类变量

'data.frame':   2644 obs. of  6 variables:
 $ Sample  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Year    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Month   : int  1 1 1 1 1 1 1 1 1 2 ...
 $ Location: int  1 3 1 1 1 1 1 3 3 1 ...
 $ Sex     : int  2 2 2 2 2 2 2 2 2 2 ...
 $ GSI     : num  10.44 9.83 9.74 9.31 8.99 ...




Squid$fLocation <- factor(Squid$Location)
Squid$fSex <- factor(Squid$Sex)
   [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1

   [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 
  [36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 
  [71] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1
[2591] 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 2 1 2 1 1 2 1 1 2 1 2 2 1 1 1 1
[2626] 1 1 1 1 1 2 1 1 1 2 1 2 1 2 1 2 1 1 1
Levels: 1 2




Squid$fSex <- factor(Squid$Sex,levels = c(1,2),labels = c("M","F"))
   [1] F F F F F F F F F F F F F F F F F F F F F F F M F F F F F F F F F F F
  [36] F F F F F F F F F F F F M F F F F F F F F F M F M M M M F M M M M M M
[2556] F M M M M F F M M M M M M M F M M M M M M F M M F M M M F M M F M M M 
[2591] M M M M M M M M M F M M F M M F M M M F M F M M F M M F M F F M M M M 
[2626] M M M M M F M M M F M F M F M F M M M 
Levels: M F 





boxplot(GSI ~ fSex,data = Squid)

M1 <- lm(GSI ~ fSex+fLocation,data = Squid)

lm(formula = GSI ~ fSex + fLocation, data = Squid)

(Intercept)        fSexF   fLocation2   fLocation3   fLocation4  
     1.3593       2.0248      -1.8552      -0.1425       0.5876


lm(formula = GSI ~ fSex + fLocation, data = Squid)
    Min      1Q  Median      3Q     Max
-3.4137 -1.3195 -0.1593  1.2039 11.2159
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  1.35926    0.07068  19.230   <2e-16 ***
fSexF        2.02481    0.09427  21.479   <2e-16 ***
fLocation2  -1.85525    0.20027  -9.264   <2e-16 ***
fLocation3  -0.14248    0.12657  -1.126   0.2604   
fLocation4   0.58756    0.34934   1.682   0.0927 . 
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 2.415 on 2639 degrees of freedom
Multiple R-squared: 0.1759,     Adjusted R-squared: 0.1746
F-statistic: 140.8 on 4 and 2639 DF,  p-value: < 2.2e-16



M2 <- lm(GSI ~ factor(Sex)+factor(Location),data = Squid)
lm(formula = GSI ~ factor(Sex) + factor(Location), data = Squid)
    Min      1Q  Median      3Q     Max
-3.4137 -1.3195 -0.1593  1.2039 11.2159
                  Estimate Std. Error t value Pr(>|t|)   
(Intercept)        1.35926    0.07068  19.230   <2e-16 ***
factor(Sex)2       2.02481    0.09427  21.479   <2e-16 ***
factor(Location)2 -1.85525    0.20027  -9.264   <2e-16 ***
factor(Location)3 -0.14248    0.12657  -1.126   0.2604   
factor(Location)4  0.58756    0.34934   1.682   0.0927 . 
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 2.415 on 2639 degrees of freedom
Multiple R-squared: 0.1759,     Adjusted R-squared: 0.1746
F-statistic: 140.8 on 4 and 2639 DF,  p-value: < 2.2e-16



   [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1
[2626] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Levels: 1 2 3 4



Squid$fLocation <- factor(Squid$Location,levels= c(2,3,1,4))
   [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1
  [71] 1 1 1 1 1 3 1 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3 1 3 1
] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Levels: 2 3 1 4
boxplot(GSI ~ fLocation,data = Squid)




SquidM <- Squid[Squid$Sex == 1,]
SquidM <- Squid[Squid$fSex == "1"]






Squid$fSex <- factor(Squid$Sex,labels = c("M","F"))
Squid$fLocation <- factor(Squid$Location)
'data.frame':   2644 obs. of  8 variables:
 $ Sample   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Year     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Month    : int  1 1 1 1 1 1 1 1 1 2 ...
 $ Location : int  1 3 1 1 1 1 1 3 3 1 ...
 $ Sex      : int  2 2 2 2 2 2 2 2 2 2 ...
 $ GSI      : num  10.44 9.83 9.74 9.31 8.99 ...
 $ fLocation: Factor w/ 4 levels "1","2","3","4": 1 3 1 1 1 1 1 3 3 1 ...
 $ fSex     : Factor w/ 2 levels "M","F": 2 2 2 2 2 2 2 2 2 2 ...



write.table    把一个变量写入到ascii文件中      write.table(Squid,file="test.txt")

order           确定数据的排序                      order(x)

merge          合并两个数据框                      merege(a,b,by="ID") 

str               显示一个对象的内部结构          str(Squid)

factor           定义变量作为因子                  factor(x)


posted on 2016-04-11 17:28  MartinChau  阅读(1174)  评论(0编辑  收藏  举报
