solutions to ISLR

Chapter 2

a) better, the more samples can make the function fit pratcal problem better.

b) worse, since the number of observations is small, the more flexiable statistical method will result in the more over-fit function.

c) better, the more samples enable the flexiable method to fit the data better.

d) worse, due to the variance is high, with a fixed number of samples, the more flexible the statistical method is, the more function will overfit.

a) regression, n: 500 firms in US; p: profit, number of employees, industry.

b) classification, n: 20 products that were previously launched; p: price, marketing budget, competition price, ten other variables.

c) regression, n: weekly data for all of 2012; p: % change in the US market, the % change in British market, the % change in the German market.

a) the Bayes Error is going to be a fixed horizonal line which goes in parallel with the x axis, since it is a constant value; training error goes down with the flexibility goes up, since the stastics method fits the training data more precisely with the increasing flexibility; test error first goes down but increase after a point which near the line of Bayes Error due to over-fitting; the variance starts low but goes up constantly because the more method fit the traininig data, the great difference will be when change training data set.

b) as previous.

a) i. recognize the number in one image; inference; ii. provide the statistcs of history data, judge if a project will succeed; prediction; iii. recommendation system by categorize the user; prediction

b) i. get a function with provided statistics; inference; ii. predict the stock prize of future; prediction; iii. give the possible price of next year of one house; prediction

c) i. find types of virus; ii. judge the possible geographical boundary between people holding different political belief; iii. categorize users by their taste for book

Advantage: can match the test data better; Disadvantage: easy to over-fit if the data is not enough; high variance; require high computatiblity to get parameters.

A non-linear and complicated module need more flexible approach; otherwise a less flexible approach is preferred.

Parametric statistics learning method:

pros: simpler, speed, less data; cons: constrained, limited comlexity, poor fit

Non-parametric statistics learning method:

pros: flexibility, power, performance; cons: more data, slower, overfitting.

a) 1: 3; 2: 2; 3: 10^(1/2); 4: 5^(1/2); 5: 2^(1/2); 6: 3^(1/2)

b) Green, since Obs.5 is the closest point to the test data.

c) Red, since Obs.2, 5, 6 is the cloeset 3 points to the test data, and both 2 and 6 are red.

a) str(auto);　　#all variables are quantitative variable excpet name and horsepower
b) summary(auto[, -c(4,9)])

c) sapply(auto[, -c(4, 9)], mean)

sapply(auto[, -c(4, 9)], sd)

mpg 　　　　cylinders displacement weight　　　　 acceleration
7.8258039 1.7015770 104.3795833 847.9041195 2.7499953
year 　　　　 origin
3.6900049 0.8025495

sapply(auto[-c(10:85), -c(4, 9)], range)

sapply(auto[-c(10:85), -c(4, 9)], mean)

sapply(auto[-c(10:85), -c(4, 9)], sd)

Chapter 3

NULL hypothesis is that the predictors "TV", "radio", "newspaper" have no effect on sale.

Conclusion: p-value are not significatnt for "TV" and "Radio", so they have great probability have effect on sales, while we can't reject newspaper does not have effect.

The KNN classifier is typically used to solve classification problems (those with a qualitative response) by identifying the neighborhood of $x_{0}$

$x_{0}$

f(ave(x)) = b0 + b1*ave(x) = ave(y) - b1*ave(x) + b1*ave(x) = ave(y)

attach(Auto)

lm.fit = lm(mpg~horsepower)

summary(lm.fit)

since p-value is not significant, we may assume there is a relationship between the predictor and the response.

ii.

R^2 is 0.6059, which indicates that 60.59% of the variability of mpg can be explained by horsepower.

iii.

negative

iv.

predict(lm.fit, data.frame(horsepower=98), interval="confidence")

fit lwr upr

1 24.46708 14.8094 34.12476

predict(lm.fit, data.frame(horsepower=98), interval="prediction")

fit lwr upr

1 24.46708 23.97308 24.96108

plot(horsepower,mpg)

abline(lm.fit)

10,

c) S a l e s = 13.0434689 + (- 0.0544588) \times P r i c e + (- 0.0219162) \times U r b a n + (1.2005727) \times U S + ε

with U r b a n

confint(fit2) 2.5 % 97.5 % (Intercept) 11.79032020 14.27126531 Price -0.06475984 -0.04419543 USYes 0.69151957 1.70776632 11, c) We obtain the same value for the t-statistic and consequently the same value for the corresponding p-value. Both results in (a) and (b) reflect the same line created in (a). In other words, y = 2 x + ε 15,

c)

simple.reg <- vector ("numeric", 0) simple.reg <- c (simple.reg, fit.zn $ coefficient [2]) simple.reg <- c (simple.reg, fit.indus $ coefficient [2]) simple.reg <- c (simple.reg, fit.chas $ coefficient [2]) simple.reg <- c (simple.reg,

chapter 4

for x in [0.5,0.95]: 10%;

for x < 0.5, [0,x]: (x+5)%;

for x > 0.95, [x,1]: (100-x+5)% = (105-x)%;

so, the average fraction of the available observations which we will use to make prediction are

10% * 0.9 + ave((x+5)%) * 0.05 + ave((105-x)%) * 0.05 = 9% + 7.5% * 0.05 + 7.5% * 0.05 = 9.75%.

9.75% * 9.75% = 9.50625%

9.75%^100 ≈ 0%

lim_p->_∞9.75^p = 0

1: 10%,

2: √10%

100: log₁₀₀0.1

a) QDA, LDA.

b) QDA, QDA.

c) It depends. If the true model is linear, LDA may perform better than QDA, but roughly speaking, when given large training set, the QDA can perform better since it is more flexible.

d) true. When given large training set, this probably happens.

P = exp(-6 + 0.05 * 40 + 1 * 3.5) / (exp(-6 + 0.05 * 40 + 1 * 3.5) + 1) = 0.3775.

-6 + 0.05 * x + 3.5 * 1 = 0 --> x = 50

0.752

KNN with K = 1 has 0% trainning error rate. But its average error rate is 18%, so its test error rate is 36%. So we choose to use logistic regression since it only has 30% test error rate.

p(x)/(1-p(x)) = 0.37 --> p(x) = 0.27

16% / (1-16%) = 0.19

10,

b) Yes, Lag2, since its p-value is less than 0.05.

pred = predict(lm.fit, type="response")

lm.pred = rep("Down", length(pred))

lm.pred[pred>0.5] = "Up"

table(lm.pred, Direction)

We may conclude that the percentage of correct predictions on the training data is (54+557)/1089 wich is equal to 56.1065197%. In other words 43.8934803% is the training error rate, which is often overly optimistic. We could also say that for weeks when the market goes up, the model is right 92.0661157% of the time 557/(48+557). For weeks when the market goes down, the model is right only 11.1570248% of the time 54/(54+430).

posted on 2017-02-11 19:01 alex_wood 阅读(424) 评论(1) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

alex_wood

solutions to ISLR

chapter 4

导航

公告