机器学习：eclipse中调用weka的Classifier分类器代码Demo

　　weka中实现了很多机器学习算法，不管实验室研究或者公司研发，都会或多或少的要使用weka，我的理解是weka是在本地的SparkML，SparkML是分布式的大数据处理机器学习算法，数据量不是很大的时候，使用weka可以模拟出很好的效果，决定使用哪个模型，然后再继续后续的数据挖掘工作。

　　下面总结一个eclipse中调用weka的Classifier分类器代码的Demo，通过这个实例，可以进一步跟踪分类算法的原理，查看weka源码，下一节中，介绍最简单的IB1(1NN)算法源码的具体分析。

　　以下是一个调用各种IB1分类器的过程，下一节介绍下IB1算法的源码分析。

package mytest;

import java.io.File;

import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.lazy.IB1;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffLoader;
//import wlsvm.WLSVM;

public class SimpleClassification {//分类器

    public static void main(String[] args) {
        Instances ins = null;
        Classifier cfs = null;
        try {
            File file = new File("E:\\Develop/Weka-3-6/data/contact-lenses.arff");
//            File file = new File("E:\\yuce/data.csv");
            ArffLoader loader = new ArffLoader();
            loader.setFile(file);
            ins = loader.getDataSet();

            // 在使用样本之前一定要首先设置instances的classIndex，否则在使用instances对象是会抛出异常
            ins.setClassIndex(ins.numAttributes() - 1);
            

            cfs = new IB1();

//            参数设置
//            String[] options=weka.core.Utils.splitOptions("-S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C 1.0 -E 0.0010 -P 0.1 -B 0");
//            cfs.setOptions(options);
            
            
            Instance testInst;
            Evaluation testingEvaluation = new Evaluation(ins);
            int length = ins.numInstances();
            for (int i = 0; i < length; i++) {
                testInst = ins.instance(i);
                // 通过这个方法来用每个测试样本测试分类器的效果
                double predictValue = testingEvaluation.evaluateModelOnceAndRecordPrediction(cfs,
                        testInst);
                
                System.out.println(testInst.classValue()+"--"+predictValue);
            }

            System.out.println("分类器的正确率：" + (1 - testingEvaluation.errorRate()));

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

}

步骤的详细解释：

　　1）arff文件中读取数据集，并解析到数据结构Instances 里。

　　2）创建一个分类器 new IB1();

　　3）设置参数等操作 splitOptions 并且设置决策属性，一般是最后一个属性： ins.setClassIndex(ins.numAttributes() - 1);

　　4）创建一个评估器new Evaluation(ins)

　　5）交叉验证，并输出测试样本的分类结果及评价参数。testingEvaluation.evaluateModelOnceAndRecordPrediction(cfs, testInst);

data数据集：

@relation contact-lenses

@attribute age             {young, pre-presbyopic, presbyopic}
@attribute spectacle-prescrip    {myope, hypermetrope}
@attribute astigmatism        {no, yes}
@attribute tear-prod-rate    {reduced, normal}
@attribute contact-lenses    {soft, hard, none}

@data
%
% 24 instances
%
young,myope,no,reduced,none
young,myope,no,normal,soft
young,myope,yes,reduced,none
young,myope,yes,normal,hard
young,hypermetrope,no,reduced,none
young,hypermetrope,no,normal,soft
young,hypermetrope,yes,reduced,none
young,hypermetrope,yes,normal,hard
pre-presbyopic,myope,no,reduced,none
pre-presbyopic,myope,no,normal,soft
pre-presbyopic,myope,yes,reduced,none
pre-presbyopic,myope,yes,normal,hard
pre-presbyopic,hypermetrope,no,reduced,none
pre-presbyopic,hypermetrope,no,normal,soft
pre-presbyopic,hypermetrope,yes,reduced,none
pre-presbyopic,hypermetrope,yes,normal,none
presbyopic,myope,no,reduced,none
presbyopic,myope,no,normal,none
presbyopic,myope,yes,reduced,none
presbyopic,myope,yes,normal,hard
presbyopic,hypermetrope,no,reduced,none
presbyopic,hypermetrope,no,normal,soft
presbyopic,hypermetrope,yes,reduced,none
presbyopic,hypermetrope,yes,normal,none

data详细分析：

　　1）@relation contact-lenses 是表名

　　2）@attribute age {young, pre-presbyopic, presbyopic} 是属性名和属性类型

　　3）@data 是数据集，一个数组的形式。

若data是cvs的格式，weka也支持，最好使用weka的tools工具转化为arff格式的数据集。

输出结果为：

转置请注明出处：http://www.cnblogs.com/rongyux/

posted @ 2016-04-08 11:07 rongyux 阅读(2162) 评论(1) 编辑收藏举报

努力加载评论中...

刷新页面返回顶部

rongyux

机器学习：eclipse中调用weka的Classifier分类器代码Demo

公告