An introduction to r

 

 

1. quit console 
> q()   or 

  >quit()   

2. help
> help(solve)   or 

  > ?solve

> help.start()

 

3. help.search() --search for help 

>help.search(solve) or 

  > ??solve

 

4. show examples of functions or topic

> example(topic)  ---example(solve)

 

5. batch run r code, that is, running r by code of file  
> source("commands.R")

 

6. redirect the output to files 

> sink("record.lis")  --redicrct the console output to the file named "record.lis"
> sink()                  --restore the output to the console

 

7. display the objects currently stored in R

> objects() or 

  >ls()

 

8. remove objects currrently stored in R
> rm(x, y, z, ink, junk, temp, foo, bar)

 

9.vectors assignment or creation , 
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)  or  --c() is a function to generate a vector
  > assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7)) or 
      > c(10.4, 5.6, 3.1, 6.4, 21.7) -> x

--manipulatation of variables

> 1/x
> y <- c(x, 0, x)
> v <- 2*x + y + 1

 

10. basic function

x is vectore

>min(x)

>max(x)

>range(x)

>length(x)

log

exp

sin

cos

tan

sqrt

sum

prod   ---production of all elements

var      ---sample variance = sum((x-mean(x))^2) / (length(x)-1)

          ---If the argument to var() is an n-by-p matrix the value is a p-by-p sample covariance matrix got by regarding the rows as independent p-variate sample vectors.

sort    ---sort the vectore in increasing order 

pmax  --parallel maximum and minimum max and min, operating on several vectors

pmin

sqrt   ---can compute the sqare root of complex number

 

11. generating sequences

>1:30

  >n<- 10

  >1:n-1

  >1:(n-1)

 

 12. generate sequences by seq() function

> seq(-5, 5, by=.2) -> s3

  > s4 <- seq(length=51, from=-5, by=.2)

    parameters: seq(from, to, by, length, along)

 

13. replication function

> s5 <- rep(x, times=5)  ---replication of sequences of numbers
> s6 <- rep(x, each=5)   ---repiication of each numbers belonging to the sequences of numbers

14. logical vectors


> temp <- x > 13

 

or ---- | 

and -----&

 

15. missing variables

NA 

>ind <- is.na(z)     ----- is.na(var)

  -- one cannot use x==NA as logical expression, since NA is not a real number, one must use is.na() function to test the NA

  NA --Not Available

> z <- c(1:3,NA); 

 

NaN----Not a Number

>is.nan(XX)

> 0/0
> Inf - Inf
In summary, is.na(xx) is TRUE both for NA and NaN values. To differentiate these, is.nan(xx) is only TRUE for NaNs. Missing values are sometimes printed as <NA> when character vectors are printed without quotes.

 

16. paste() --concatenating strings character by character

 The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings

> labs <- paste(c("X","Y"), 1:10, sep="")

("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")

 

 

17. indexing vectores

--logical vector


> y <- x[!is.na(x)] --listing the non-na elements of x into y

  --- x[c(1,0,1,0,0,1)]


> (x+1)[(!is.na(x)) & x>0] -> z

 

--vector indexed by positive integers
> x[1:10]
> c("x","y")[rep(c(1,2,2,1), times=4)]

 

---vector indexed by negative integers,which exclude the corresponding elements

> y <- x[-(1:5)]

--vector indexed by names of component, name each position


> fruit <- c(5, 10, 1, 20)
> names(fruit) <- c("orange", "banana", "apple", "peach")
> lunch <- fruit[c("apple","orange")]

 

--application 


> x[is.na(x)] <- 0  --missing treatment


> y[y < 0] <- -y[y < 0]   ---absolute each elements
  > y <- abs(y)

 

20. variable type transfomration: char - integer, integer - char
> z <- 0:9
> digits <- as.character(z)
> d <- as.integer(digits)

 

21. truncate the size of vectore

> alpha <- alpha[2 * 1:5]
> length(alpha) <- 3

 

22. attributes() function
> attr(z, "dim") <- c(10,10)   ---treat z as if it is 10-by-10 matrix

23. uclass(obj): remove the temparory effect of class 
> winter
> unclass(winter)

 

 

24. factor() function: classification / grouping of elements: 

levels(): find the distinct elements of the vectors, by combination with factor function


> state <-
c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa",

"qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas",
"sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "nsw", "vic", "vic", "act")


> statef <- factor(state)
> statef
> levels(statef)

 

25. tapply(): to apply a function to each group of compoment defined by second arguments, for the data info given by the first  argv.


> incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,

61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
59, 46, 58, 43)


> incmeans <- tapply(incomes, statef, mean)

--function building
> stderr <- function(x) sqrt(var(x)/length(x))    ---one can not use: function(x) ...->stderr, not work
> incster <- tapply(incomes, statef, stderr)
> incster

26. array from vector: treating vector as it is an array

assuming z has 1500 elements, only this case, we can use the following statement

>dim(z) <- c(3,5,100)

 

27. array define: 

> x <- array(1:20, dim=c(4,5)) # Generate a 4 by 5 array., syntax: array(values, array_format)
> x
> i <- array(c(1:3,3:1), dim=c(3,2))
> i # i is a 3 by 2 index array.
> x[i] # Extract those elements
> x[i] <- 0 # Replace those elements by zeros.
> x

28. matrix ------!!!!!!!!!!!!!!!!!!!!!!!!!
> Xb <- matrix(0, n, b)
> Xv <- matrix(0, n, v)
> ib <- cbind(1:n, blocks)
> iv <- cbind(1:n, varieties)
> Xb[ib] <- 1
> Xv[iv] <- 1
> X <- cbind(Xb, Xv)
> N <- crossprod(Xb, Xv)
> N <- table(blocks, varieties)

 

29. array function 
> Z <- array(data_vector, dim_vector)
> Z <- array(h, dim=c(3,4,2))
> Z <- h ; dim(Z) <- c(3,4,2)
> Z <- array(0, c(3,4,2))
> D <- 2*A*B + C + 1

 

30. outer product -----An important operation on arrays is the outer product. If a and b are two numeric arrays, their outer product is an array whose dimension vector is obtained by concatenating their two dimension vectors (order is important), and whose data vector is got by forming all possible products of elements of the data vector of a with those of b.

-- different from formal outer product.
> ab <- a %o% b     or 
  > ab <- outer(a, b, "*")

---formatted outer product
> f <- function(x, y) cos(y)/(1 + x^2)
> z <- outer(x, y, f)


> d <- outer(0:9, 0:9)
> fr <- table(outer(d, d, "-"))
> plot(as.numeric(names(fr)), fr, type="h",xlab="Determinant", ylab="Frequency")

 

31. generalized transopose of array --!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

>perm(c(1,2,..k)) ---permutation of a list of numbers 
> B <- aperm(A, c(2,1))   ---aperm: permuation of array

 

>t(A) --transpose of an array

>nrow(A) --number of rows 

>ncol(A) -- number of cols

 

32. matrix muliplication

--product element by the corresponding element, supposing same size of a and b
> A * B

 

--matrix product
> A %*% B

> x %*% A %*% x    -quardratic form   

 

>crossprod(x,y)  --- === t(x) %*%y

>diag(A) --gives the vector for diagonal of array A

Also, somewhat confusingly, if k is a single numeric value then diag(k) is the k by k identity matrix!

 

33. linear equation & inversion
> b <- A %*% x
> solve(A,b)    ---solve linear equation A*x = b

>solve(A)  ---inverse of array A

   --using solve(A,b) is better than solve(A)*b

 

34. eigen values and eigen vectors
> ev <- eigen(Sm)
> evals <- eigen(Sm)$values

> eigen(Sm)$vectors
> eigen(Sm)
> evals <- eigen(Sm, only.values = TRUE)$values

 

35. singluar value decomposition (SVM)   !!!!!!!!!!!!!!!!!!need to be detailed

>svd(M)

  --such that M=U %*% D %*% t(V)

 

---if M is square, the following can be used to compute the determinant

> absdetM <- prod(svd(M)$d)
> absdet <- function(M) prod(svd(M)$d)

 

36. least square fitting -- lsfit(x,y)
> ans <- lsfit(X, y)

 

37. qr decomposition --------!!!!!!!!!!!!!!!!!need to be detailed
> Xplus <- qr(X)
> b <- qr.coef(Xplus, y)
> fit <- qr.fitted(Xplus, y)
> res <- qr.resid(Xplus, y)

38. binding matrices horizontally, vertically: rbind(), cbind()
> X <- cbind(arg_1, arg_2, arg_3, ...)
> X <- cbind(1, X1, X2)

 

39. transform array or other format to vector
> vec <- as.vector(X)
> vec <- c(X)

40. table 

> statefr <- table(statef)
> statefr <- tapply(statef, statef, length)
> factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef
> table(incomef,statef)

41. list

An R list is an object consisting of an ordered collection of objects known as its components.

> Lst <- list(name="Fred", wife="Mary", no.children=3,child.ages=c(4,7,9))

Lst[1], Lst[2] represent each part of list: name, wife ,...

 length(Lst) gives the number of (top level) components it has                    
> name$component_name

 

Lst$name is the same as Lst[[1]] and is the string "Fred",
Lst$wife is the same as Lst[[2]] and is the string "Mary",
Lst$child.ages[1] is the same as Lst[[4]][1] and is the number 4.


> x <- "name"; Lst[[x]]
> Lst <- list(name_1=object_1, ..., name_m=object_m)
> Lst[5] <- list(matrix=Mat)

 

42. concatenating list
> list.ABC <- c(list.A, list.B, list.C)

 

43. data frame

A data frame is a list with class "data.frame".


> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)

 

44. attach() & detach : make the naming easily

 make the components of a list or data frame temporarily visible as variables under their component name, without the need to quote the list name explicitly each time.
> attach(lentils)
> u <- v+w
> lentils$u <- v+w
> detach()
> attach(any.old.list)
> search()     ---show the search path, compare the diff of search() after attach () and detach()

> ls(2)
> detach("lentils")
> search()

 

45. read.table(): read from file
> HousePrice <- read.table("houses.data")
> HousePrice <- read.table("houses.data", header=TRUE)

 

46. scan() function to read from file
> inp <- scan("input.dat", list("",0,0))        ---second argv give the input format
> label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]
> inp <- scan("input.dat", list(id="", x=0, y=0))    ---second argv give both the input format and column name
> label <- inp$id; x <- inp$x; y <- inp$y
> X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)

 

47. accessing built-in datasets

>data()  --show lists of datasets currently available to use 

 

--loading data from r packages

data(package="rpart")
data(Puromycin, package="datasets")

48. edit data:  edit()
> xnew <- edit(xold)
> xnew <- edit(data.frame())

 

49 . probabiliut distribution
One convenient use of R is to provide a comprehensive set of statistical tables. Functions are provided to evaluate the cumulative distribution function P(X <= x), the probability density function and the quantile function (given q, the smallest x such that P(X <= x) > q), and to simulate from the distribution.

Prefix the name given here by ‘d’ for the density, ‘p’ for the CDF, ‘q’ for the quantile function and ‘r’ for simulation (random deviates).


> ## 2-tailed p-value for t distribution
> 2*pt(-2.43, df = 13)
> ## upper 1% point for an F(2, 7) distribution
> qf(0.01, 2, 7, lower.tail = FALSE)

 

50. examing distribution of variables
> attach(faithful)
> summary(eruptions)
> fivenum(eruptions)
> stem(eruptions)
> hist(eruptions)
> hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE)
> lines(density(eruptions, bw=0.1))
> rug(eruptions) # show the actual data points

#########################skipping this part###################

> plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE)
> long <- eruptions[eruptions > 3]
> plot(ecdf(long), do.points=FALSE, verticals=TRUE)
> x <- seq(3, 5.4, 0.01)
> lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)
> shapiro.test(long)
> ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long)))
> t.test(A, B)
> var.test(A, B)
> t.test(A, B, var.equal=TRUE)
> wilcox.test(A, B)
> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B))
> plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)
> ks.test(A, B)

 

60. grouping, looping and conditional format
> if (expr_1) expr_2 else expr_3
> for (name in expr_1) expr_2

for (....) {

....

}
## other loops
> repeat expr
> while (condition) expr

 

61. writing functions
> name <- function(arg_1, arg_2, ...) expression
> twosam <- function(y1, y2) {

n1 <- length(y1); n2 <- length(y2)
yb1 <- mean(y1); yb2 <- mean(y2)
s1 <- var(y1); s2 <- var(y2)
s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2))
tst
}

 


> tstat <- twosam(data$male, data$female); tstat
> bslash <- function(X, y) {
> regcoeff <- bslash(Xmat, yvar)

 

62. define new operator 

> "%!%" <- function(X, y) { ... }

>X  %!% y    ---similar as define function

 

63. named funciton: (can be ignored similary with C++)
> fun1 <- function(data, data.frame, graph, limit) {
> ans <- fun1(d, df, TRUE, 20)
> ans <- fun1(d, df, graph=TRUE, limit=20)
> ans <- fun1(data=d, limit=20, graph=TRUE, data.frame=df)
> fun1 <- function(data, data.frame, graph=TRUE, limit=20) { ... }
> ans <- fun1(d, df)
> ans <- fun1(d, df, limit=10)
> bdeff <- function(blocks, varieties) {
> temp <- X
> dimnames(temp) <- list(rep("", nrow(X)), rep("", ncol(X)))
> temp; rm(temp)
> no.dimnames(X)

 

64. scope
S> cube(2)
S> n <- 3
S> cube(2)
R> cube(2)
if(amount > total)

65. Customization    

If that variable is unset, the file Rprofile.site in the R home subdirectory etc is used. This file should contain the commands that you want to execute every time R is started under your system. A second, personal, profile file named .Rprofile24 can be placed in any directory.

 

> .First <- function() {
options(prompt="$ ", continue="+\t") # $ is the prompt
options(digits=5, length=999) # custom numbers and printout
x11() # for graphics
par(pch = "+") # plotting character
source(file.path(Sys.getenv("HOME"), "R", "mystuff.R"))
# my personal functions
library(MASS) # attach a package
}

 

> .Last <- function() {
graphics.off() # a small safety measure.
cat(paste(date(),"\nAdios\n")) # Is it time for lunch?
}


> .First <- function() {
> .Last <- function() {
> methods(class="data.frame")
> methods(plot)
> coef
> methods(coef)
> getAnywhere("coef.aov")
> getS3method("coef", "aov")
> fitted.model <- lm(formula, data = data.frame)
> fm2 <- lm(y ~ x1 + x2, data = production)
> fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)
> anova(fitted.model.1, fitted.model.2, ...)
> new.model <- update(old.model, new.formula)
> fm05 <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = production)
> fm6 <- update(fm05, . ~ . + x6)
> smf6 <- update(fm6, sqrt(.) ~ .)
> fmfull <- lm(y ~ . , data = production)
> fitted.model <- glm(formula, family=family.generator, data=data.frame)
> fm <- glm(y ~ x1 + x2, family = gaussian, data = sales)
> fm <- lm(y ~ x1+x2, data=sales)
> kalythos <- data.frame(x = c(20,35,45,55,70), n = rep(50,5),
> kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y)
> fmp <- glm(Ymat ~ x, family = binomial(link=probit), data = kalythos)
> fml <- glm(Ymat ~ x, family = binomial, data = kalythos)
> summary(fmp)
> summary(fml)
> ld50 <- function(b) -b[1]/b[2]
> ldp <- ld50(coef(fmp)); ldl <- ld50(coef(fml)); c(ldp, ldl)
> fmod <- glm(y ~ A + B + x, family = poisson(link=sqrt),
> nlfit <- glm(y ~ x1 + x2 - 1,
> x <- c(0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56,
> y <- c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200)
> fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)
> plot(x, y)
> xfit <- seq(.02, 1.1, .05)
> yfit <- 200 * xfit/(0.1 + xfit)
> lines(spline(xfit, yfit))
> out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)
> sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian)))
> plot(x, y)
> xfit <- seq(.02, 1.1, .05)
> yfit <- 212.68384222 * xfit/(0.06412146 + xfit)
> lines(spline(xfit, yfit))
> df <- data.frame(x=x, y=y)
> fit <- nls(y ~ SSmicmen(x, Vm, K), df)
> fit
> summary(fit)
Estimate Std. Error t value Pr(>|t|)
> x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113,
> y <- c( 6, 13, 18, 28, 52, 53, 61, 60)
> n <- c(59, 60, 62, 56, 63, 59, 62, 60)
> fn <- function(p)
> out <- nlm(fn, p = c(-50,20), hessian = TRUE)
> sqrt(diag(solve(out$hessian)))
> pairs(X)
> coplot(a ~ b | c)
> coplot(a ~ b | c + d)
> plot(x, y, type="n"); text(x, y, names)
> text(x, y, expression(paste(bgroup("(", atop(n, x), ")"), p^x, q^{n-x})))
> help(plotmath)
> example(plotmath)
> demo(plotmath)
> help(Hershey)
> demo(Hershey)
> help(Japanese)
> demo(Japanese)
> text(locator(1), "Outlier", adj=0)
> plot(x, y)
> identify(x, y)
> oldpar <- par(col=4, lty=2)
> par(oldpar)
> oldpar <- par(no.readonly=TRUE)
> par(oldpar)
> plot(x, y, pch="+")
> legend(locator(1), as.character(0:25), pch = 0:25)
> postscript()
> dev.off()
> postscript("file.ps", horizontal=FALSE, height=5, pointsize=10)
> postscript("plot1.eps", horizontal=FALSE, onefile=FALSE,
> library()
> library(boot)
> search()
> loadedNamespaces()
> help.start()
w <- ifelse(Mod(w) > 1, 1/w, w)
R [options] [<infile] [>outfile],
Note that input and output can be redirected in the usual way (using 鈥?鈥?and 鈥?鈥?, but the line length limit of 4095 bytes still applies. Warning and error messages are sent to the error channel (stderr).
q(status=<exit status code>)
Many of these use either Control or Meta characters. Control characters, such as Control-m, are obtained by holding the <CTRL> down while you press the <m> key, and are written as C-m below. Meta characters, such as Meta-b, are typed by holding down <META>28 and pressing <b>, and written as M-b in the following. If your terminal does not have a <META> key enabled, you can still type Meta characters using two-character sequences starting with ESC. Thus, to enter M-b, you could type <ESC><b>. The ESC character sequences are also allowed on terminals with real Meta keys. Note that case is significant for Meta characters.
The R program keeps a history of the command lines you type, including the erroneous lines, and commands in your history may be recalled, changed if necessary, and re-submitted as new commands. In Emacs-style command-line editing any straight typing you do while in this editing phase causes the characters to be inserted in the command you are editing, displacing any characters to the right of the cursor. In vi mode character insertion mode is started by M-i or M-a, characters are typed and insertion mode is finished by typing a further <ESC>. (The default is Emacs-style, and only that is described here: for vi mode see the readline documentation.)
Pressing the <RET> command at any time causes the command to be re-submitted.
<DEL>
<RET>
The final <RET> terminates the command line editing sequence.
>: Logical vectors
>=: Logical vectors

posted @ 2014-04-11 11:36  yjjsdu  阅读(317)  评论(0编辑  收藏  举报