【机器学习】libsvm使用的数据格式
该软件使用的训练数据和检验数据文件格式如下:
<label> <index1>:<value1> <index2>:<value2> ...
其中<label> 是训练数据集的目标值,对于分类,它是标识某类的整数(支持多个类);对于回归,是任意实数。<index> 是以1开始的整数,可以是不连续的;<value>为实数,也就是我们常说的自变量。检验数据文件中的label只用于计算准确度或误差,如果它是未知的,只需用一个数填写这一栏,也可以空着不填。在程序包中,还包括有一个训练数据实例:heart_scale,方便参考数据文件格式以及练习使用软件。
在NET.SVM中,Node和Problem类提供此项功能。如果说我要定义如下数据集:
+1 1:0.708333 2:1 3:1 4:-0.320755
-1 1:0.583333 2:-1 3:0.333333 4:-0.603774
+1 1:0.166667 2:1 3:-0.333333 4:-0.433962
-1 1:0.458333 2:1 3:1 4:-0.358491
可以先定义X的矩阵:private Node[][] _X;一组Y的数组private double[] _Y;然后再分别给_X和_Y赋值;
当然,也可以用Problem类提供的方法来Read(Stream)和Read(String)来将数据直接读取到Problem中去。
Class Node
Member |
Description |
Default Constructor.
|
|
Constructor.
|
|
Compares this node with another.
|
|
(Inherited from Object.) |
|
(Inherited from Object.) |
|
(Inherited from Object.) |
|
(Inherited from Object.) |
|
Index of this Node.
|
|
(Inherited from Object.) |
|
String representation of this Node as {index}:{value}. (Overrides Object.ToString()()().) |
|
Value at Index. |
Class Problem
Member |
Description |
Problem(Int32, array<Double>[]()[], array<array<Node>[]()[]>[]()[], Int32) |
Constructor.
|
Empty Constructor. Nothing is initialized.
|
|
Number of vectors.
|
|
(Inherited from Object.) |
|
(Inherited from Object.) |
|
(Inherited from Object.) |
|
(Inherited from Object.) |
|
Maximum index for a vector.
|
|
(Inherited from Object.) |
|
Reads a problem from a stream.
|
|
Reads a Problem from a file.
|
|
(Inherited from Object.) |
|
Writes a problem to a stream.
|
|
Writes a problem to a file. This will overwrite any previous data in the file.
|
|
Vector data.
|
|
Class labels. |