R Basics

笔记来源:

http://wiki.stdout.org/rcookbook/

Indexing into a data structure

Problem

You want to get part of a data structure.

Solution

Elements from a vector, matrix, or data frame can be extracted using numeric indexing, or by using a boolean vector of the appropriate length.

In many of the examples, below, there are multiple ways of doing the same thing.

Indexing with numbers and names

With a vector:

# A sample vector

v <- c(1,4,4,3,2,2,3)

v[c(2,3,4)]

v[2:4]

# 4 4 3

v[c(2,4,3)]

# 4 3 4

With a data frame:

# Create a sample data frame

data <- read.table(header=T, con <- textConnection('

subject sex size

1 M 7

2 F 6

3 F 9

4 M 11

'))

close(con)

# Get the element at row 1, column 3

data[1,3]

data[1,"size"]

# 7

# Get rows 1 and 2, and all columns

data[1:2, ]

data[c(1,2), ]

# subject sex size

# 1 M 7

# 2 F 6

# Get rows 1 and 2, and only column 2

data[1:2, 2]

data[c(1,2), 2]

# [1] M F

# Levels: F M

# Get rows 1 and 2, and only the columns named "sex" and "size"

data[1:2, c("sex","size")]

data[c(1,2), c(2,3)]

# sex size

# M 7

# F 6

Indexing with a boolean vector

With the vector v from above:

v > 2

# FALSE TRUE TRUE TRUE FALSE FALSE TRUE

v[v>2]

v[ c(F,T,T,T,F,F,T)]

# 4 4 3 3

With the data frame from above:

# A boolean vector

data$subject < 3

# TRUE TRUE FALSE FALSE

data[data$subject < 3, ]

data[c(TRUE,TRUE,FALSE,FALSE), ]

# subject sex size

# 1 M 7

# 2 F 6

# It is also possible to get the numeric indices of the TRUEs

which(data$subject < 3)

# 1 2

Negative indexing

Unlike in some other programming languages, when you use negative numbers for indexing in R, it doesn't mean to index backward from the end. Instead, it means to drop the element at that index, counting the usual way, from the beginning.

# Here's the vector again.

# 1 4 4 3 2 2 3

# Drop the first element

v[-1]

# 4 4 3 2 2 3

# Drop first three

v[-1:-3]

# 3 2 2 3

# Drop just the last element

v[-length(v)]

# 1 4 4 3 2 2

Getting a subset of a data structure

Problem

You want to do get a subset of the elements of a vector, matrix, or data frame.

Solution

To get a subset based on some conditional criterion, the subset() function or indexing using square brackets can be used. In the examples here, both ways are shown.

# A sample vector

v <- c(1,4,4,3,2,2,3)

subset(v, v<3)

v[v<3]

# 1 2 2

# Another vector

t <- c("small", "small", "large", "medium")

# Remove "small" entries

subset(t, t!="small")

t[t!="small"]

# "large" "medium"

One important difference between the two methods is that you can assign values to elements with square bracket indexing, but you cannot with subset().

v[v<3] <- 9

# 9 4 4 3 9 9 3

subset(v, v<3) <- 9

# Error in subset(v, v < 3) <- 9 : could not find function "subset<-"

With data frames:

# A sample data frame

data <- read.table(header=T, con <- textConnection('

subject sex size

1 M 7

2 F 6

3 F 9

4 M 11

'))

close(con)

subset(data, subject < 3)

data[data$subject < 3, ]

# subject sex size

# 1 M 7

# 2 F 6

# Subset of particular rows and columns

subset(data, subject < 3, select = -subject)

subset(data, subject < 3, select = c(sex,size))

subset(data, subject < 3, select = sex:size)

data[data$subject < 3, c("sex","size")]

# sex size

# M 7

# F 6

# Logical AND of two conditions

subset(data, subject < 3 & sex=="M")

data[data$subject < 3 & data$sex=="M", ]

# subject sex size

# 1 M 7

# Logical OR of two conditions

subset(data, subject < 3 | sex=="M")

data[data$subject < 3 | data$sex=="M", ]

# subject sex size

# 1 M 7

# 2 F 6

# 4 M 11

# Condition based on transformed data

subset(data, log2(size)>3 )

data[log2(data$size) > 50, ]

# subject sex size

# 3 F 9

# 4 M 11

# Subset if elements are in another vector

subset(data, subject %in% c(1,3))

data[data$subject %in% c(1,3), ]

# subject sex size

# 1 M 7

# 3 F 9

Making a vector filled with values

Problem

You want to create a vector with values already filled in.

Solution

rep(1, 50)

#  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

# [39] 1 1 1 1 1 1 1 1 1 1 1 1

rep(F, 20)

#  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

# [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

rep(1:5, 4)

# 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

rep(1:5, each=4)

# 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5

# Use it on a factor

rep(factor(LETTERS[1:3]), 5)

# A B C A B C A B C A B C A B C

# Levels: A B C

Information about variables

Problem

You want to find information about variables.

Solution

Here are some sample variables to work with in the examples below:

x <- 6

n <- 1:4

let <- LETTERS[1:4]

df <- data.frame(n, let)

Information about existence

# List currently defined variables

ls()

#  "df"  "let" "n"   "x"

# Check if a variable named "x" exists

exists("x")

#  TRUE

# Check if "y" exists

exists("y")

#  FALSE

# Delete variable x

rm(x)

# Error: object "x" not found

Information about size/structure

# Get information about structure

str(n)

#  int [1:4] 1 2 3 4

str(df)

# 'data.frame': 4 obs. of  2 variables:

#  $ n  : int  1 2 3 4

#  $ let: Factor w/ 4 levels "A","B","C","D": 1 2 3 4

# Get the length of a vector

length(n)

#  4

# Length probably doesn't give us what we want here:

length(df)

#  2

# Number of rows

nrow(df)

#  4

# Number of columns

ncol(df)

#  2

# Get rows and columns

dim(df)

#  4 2

Working with NULL, NA, and NaN

Problem

You want to properly handle NULL, NA, or NaN values.

Solution

Sometimes your data will include NULL, NA, or NaN. These work somewhat differently from "normal" values, and may require explicit testing.

Here are some examples of comparisons with these values:

x <- NULL

x > 5

# logical(0)

y <- NA

y > 5

# NA

z <- NaN

z > 5

# NA

Here's how to test whether a variable has one of these values:

is.null(x)

# TRUE

is.na(y)

# TRUE

is.nan(z)

# TRUE

Note that NULL is different from the other two. NULL means that there is no value, while NA and NaN mean that there is some value, although one that is perhaps not usable. Here's an illustration of the difference:

# Is y null?

is.null(y)

# FALSE

# Is x NA?

is.na(x)

# logical(0)

# Warning message:

# In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'

In the first case, it checks if y is NULL, and the answer is no. In the second case, it tries to check if x is `NA, but there is no value to be checked.

Ignoring "bad" values in vector summary functions

If you run functions like mean() or sum() on a vector containing NA or NaN, they will return NA and NaN, which is generally unhelpful, though this will alert you to the presence of the bad value. Many of these functions take the flag na.rm, which tells them to ignore these values.

vy <- c(1, 2, 3, NA, 5)

# 1  2  3 NA  5

mean(vy)

# NA

mean(vy, na.rm=TRUE)

# 2.75

vz <- c(1, 2, 3, NaN, 5)

# 1   2   3 NaN   5

sum(vz)

# NaN

sum(vz, na.rm=TRUE)

# 11

# NULL isn't a problem, because it doesn't exist

vx <- c(1, 2, 3, NULL, 5)

# 1 2 3 5

sum(vx)

# 11

Removing bad values from a vector

These values can be removed from a vector by filtering using is.na() or is.nan().

vy

# 1  2  3 NA  5

vy[ !is.na(vy) ]

# 1  2  3  5

vz

# 1   2   3 NaN   5

vz[ !is.nan(vz) ]

# 1  2  3  5

Notes

There are also the infinite numerical values Inf and -Inf, and the associated functions is.finite() andis.infinite().

posted on 2012-06-08 09:45 Buttonwood 阅读(298) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

导航

R Basics

Making a vector filled with values

Problem

Solution

Information about variables

Problem

Solution

Information about existence

Information about size/structure

Working with NULL, NA, and NaN

Problem

Solution

Ignoring "bad" values in vector summary functions

Removing bad values from a vector

Notes