weka控制台指令
java weka.classifiers.trees.J48 -t data/weather.arff
java 类的完整名称 -t表示下一个参数是训练数据集的名称
java weka.classifiers.trees.J48 -h
查看java命令行中各个参数的具体含义
-h or -help Output help information. -synopsis or -info Output synopsis for classifier (use in conjunction with -h) -t <name of training file> Sets training file. -T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data. -c <class index> Sets index of class attribute (default: last). -x <number of folds> Sets number of folds for cross-validation (default: 10). -no-cv Do not perform any cross validation. -force-batch-training Always train classifier in batch mode, never incrementally. -split-percentage <percentage> Sets the percentage for the train/test set split, e.g., 66. -preserve-order Preserves the order in the percentage split. -s <random number seed> Sets random number seed for cross-validation or percentage split (default: 1). -m <name of file with cost matrix> Sets file with cost matrix. -disable <comma-separated list of evaluation metric names> Comma separated list of metric names not to print to the output. Available metrics: Correct,Incorrect,Kappa,Total cost,Average cost,KB relative,KB information, Correlation,Complexity 0,Complexity scheme,Complexity improvement, MAE,RMSE,RAE,RRSE,Coverage,Region size,TP rate,FP rate,Precision,Recall, F-measure,MCC,ROC area,PRC area -l <name of input file> Sets model input file. In case the filename ends with '.xml', a PMML file is loaded or, if that fails, options are loaded from the XML file. -d <name of output file> Sets model output file. In case the filename ends with '.xml', only the options are saved to the XML file, not the model. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -i Outputs detailed information-retrieval statistics for each class. -k Outputs information-theoretic statistics. -classifications "weka.classifiers.evaluation.output.prediction.AbstractOutput + options" Uses the specified class for generating the classification output. E.g.: weka.classifiers.evaluation.output.prediction.PlainText -p range Outputs predictions for test instances (or the train instances if no test instances provided and -no-cv is used), along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired. Deprecated: use "-classifications ..." instead. -distribution Outputs the distribution instead of only the prediction in conjunction with the '-p' option (only nominal classes). Deprecated: use "-classifications ..." instead. -r Only outputs cumulative margin distribution. -z <class name> Only outputs the source representation of the classifier, giving it the supplied name. -g Only outputs the graph representation of the classifier. -xml filename | xml-string Retrieves the options from the XML-data instead of the command line. -threshold-file <file> The file to save the threshold data to. The format is determined by the extensions, e.g., '.arff' for ARFF format or '.csv' for CSV. -threshold-label <label> The class label to determine the threshold data for (default is the first label) Options specific to weka.classifiers.trees.J48: -U Use unpruned tree. -O Do not collapse tree. -C <pruning confidence> Set confidence threshold for pruning. (default 0.25) -M <minimum number of instances> Set minimum number of instances per leaf. (default 2) -R Use reduced error pruning. -N <number of folds> Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3) -B Use binary splits only. -S Don't perform subtree raising. -L Do not clean up after the tree has been built. -A Laplace smoothing for predicted probabilities. -J Do not use MDL correction for info gain on numeric attributes. -Q <seed> Seed for random data shuffling (default 1).
weka.core
weka核心包,基本所有类都与他有联系
核心包中的关键类:Attribute:包含attribute’s name, its type, and, in the case of a nominal or string attribute, its possible values
Instance:contains the attribute values of a particular instance
Instances:holds an ordered set of instances—in other words, a dataset
weka.classifiers
内容:contains implementations of most of the algorithms for clas-sification and numeric prediction
关键抽象类:Classifier---->>defines the general structure of any scheme for classification or numeric prediction
包含三个核心方法:buildClassifier(), classifyInstance(),distributionForInstance()
继承这个抽象类的例子:
- weka.classifiers.trees.DecisionStump
- 覆写了distributionForInstance()
- 包含getRevision(),simply returns the revision number of the classifier,used by Weka maintainers when diagnosing and debugging problems reported by users.
- 包含globalInfo(),returns a string describing the classifier, which, along with the scheme’s options
- 包含toString(), returns a textual representation of the classifier
- 包含toSource(),s used to obtain a source code repre-sentation of the learned classifier
- 包含main(),called when you ask for a decision stump from the command line,相当于执行这个类的入口
- 包含getCapabilities() ,called by the generic object editor to provide information about the capabilities of a learning scheme
其他的一些比较重要的包
weka.associations
:contains association-rule learners
weka.clusterers
:contains methods for unsupervised learning.包含非监督学习方法
weka.datagenerators
:产生人工数据
weka.estimators package
:computes different types of probability distribution
weka.filters
:提供数据清理的相关方法