在R中使用支持向量机(SVM)( 2.Kernlab包)
包里函数ksvm()通过.Call接口,使用bsvm和libsvm库中的优化方法,得以实现svm算法。对于分类,有C-SVM分类算法和v-SVM分类算法,同时还包括C分类器的有界约束的版本。对于回归,提供了ε-SVM回归算法和v-SVM回归算法。对于多类分类,有一对一(one-against-one)方法和原生多类分类方法,下面将会有介绍。例如
> library("kernlab") #导入包
> data("iris") #导入数据集iris
> irismodel <- ksvm(Species ~ ., data = iris,
+ type = "C-bsvc", kernel = "rbfdot",
+ kpar = list(sigma = 0.1), C = 10,
+ prob.model = TRUE) #训练
其中,type表示是用于分类还是回归,还是检测,取决于y是否是一个因子。缺省取C-svc或eps-svr。可取值有
• C-svc C classification
• nu-svc nu classification
• C-bsvc bound-constraint svm classification
• spoc-svc Crammer, Singer native multi-class
• kbb-svc Weston, Watkins native multi-class
• one-svc novelty detection
• eps-svr epsilon regression
• nu-svr nu regression
• eps-bsvr bound-constraint svm regression
Kernel设置核函数。可设核函数有
• rbfdot Radial Basis kernel "Gaussian"
• polydot Polynomial kernel
• vanilladot Linear kernel
• tanhdot Hyperbolic tangent kernel
• laplacedot Laplacian kernel
• besseldot Bessel kernel
• anovadot ANOVA RBF kernel
• splinedot Spline kernel
• stringdot String kernel
> irismodel
Support Vector Machine object of class "ksvm"
SV type: C-bsvc (classification)
parameter : cost C = 10
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 0.1
Number of Support Vectors : 32
Training error : 0.02
Probability model included.
>predict(irismodel, iris[c(3, 10, 56, 68, 107, 120), -5], type = "probabilities")
setosa versicolor virginica
[1,] 0.986432820 0.007359407 0.006207773
[2,] 0.983323813 0.010118992 0.006557195
[3,] 0.004852528 0.967555126 0.027592346
[4,] 0.009546823 0.988496724 0.001956452
[5,] 0.012767340 0.069496029 0.917736631
[6,] 0.011548176 0.150035384 0.838416441
Ksvm支持自定义核函数。如
>k <- function(x, y) { (sum(x * y) + 1) * exp(0.001 * sum((x - y)^2)) }
> class(k) <- "kernel"
> data("promotergene")
> gene <- ksvm(Class ~ ., data = promotergene, kernel = k, C = 10, cross = 5)#训练
> gene
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 10
Number of Support Vectors : 66
Training error : 0
Cross validation error : 0.141558
对于二分类问题,可以对结果用plot()进行可视化。例子如下
>x <- rbind(matrix(rnorm(120), , 2), matrix(rnorm(120, mean = 3), , 2))
> y <- matrix(c(rep(1, 60), rep(-1, 60)))
> svp <- ksvm(x, y, type = "C-svc", kernel = "rbfdot", kpar = list(sigma = 2))
> plot(svp)
包的连接地址在这里http://cran.r-project.org/web/packages/kernlab/index.html