
java weka.classifiers.trees.J48 -t data/weather.arff

java 类的完整名称 -t表示下一个参数是训练数据集的名称

 java weka.classifiers.trees.J48 -h


-h or -help
    Output help information.
-synopsis or -info
    Output synopsis for classifier (use in conjunction  with -h)
-t <name of training file>
    Sets training file.
-T <name of test file>
    Sets test file. If missing, a cross-validation will be performed
    on the training data.
-c <class index>
    Sets index of class attribute (default: last).
-x <number of folds>
    Sets number of folds for cross-validation (default: 10).
    Do not perform any cross validation.
    Always train classifier in batch mode, never incrementally.
-split-percentage <percentage>
    Sets the percentage for the train/test set split, e.g., 66.
    Preserves the order in the percentage split.
-s <random number seed>
    Sets random number seed for cross-validation or percentage split
    (default: 1).
-m <name of file with cost matrix>
    Sets file with cost matrix.
-disable <comma-separated list of evaluation metric names>
    Comma separated list of metric names not to print to the output.
    Available metrics:
    Correct,Incorrect,Kappa,Total cost,Average cost,KB relative,KB information,
    Correlation,Complexity 0,Complexity scheme,Complexity improvement,
    MAE,RMSE,RAE,RRSE,Coverage,Region size,TP rate,FP rate,Precision,Recall,
    F-measure,MCC,ROC area,PRC area
-l <name of input file>
    Sets model input file. In case the filename ends with '.xml',
    a PMML file is loaded or, if that fails, options are loaded
    from the XML file.
-d <name of output file>
    Sets model output file. In case the filename ends with '.xml',
    only the options are saved to the XML file, not the model.
    Outputs no statistics for training data.
    Outputs statistics only, not the classifier.
    Outputs detailed information-retrieval statistics for each class.
    Outputs information-theoretic statistics.
-classifications "weka.classifiers.evaluation.output.prediction.AbstractOutput + options"
    Uses the specified class for generating the classification output.
    E.g.: weka.classifiers.evaluation.output.prediction.PlainText
-p range
    Outputs predictions for test instances (or the train instances if
    no test instances provided and -no-cv is used), along with the 
    attributes in the specified range (and nothing else). 
    Use '-p 0' if no attributes are desired.
    Deprecated: use "-classifications ..." instead.
    Outputs the distribution instead of only the prediction
    in conjunction with the '-p' option (only nominal classes).
    Deprecated: use "-classifications ..." instead.
    Only outputs cumulative margin distribution.
-z <class name>
    Only outputs the source representation of the classifier,
    giving it the supplied name.
    Only outputs the graph representation of the classifier.
-xml filename | xml-string
    Retrieves the options from the XML-data instead of the command line.
-threshold-file <file>
    The file to save the threshold data to.
    The format is determined by the extensions, e.g., '.arff' for ARFF 
    format or '.csv' for CSV.
-threshold-label <label>
    The class label to determine the threshold data for
    (default is the first label)

Options specific to weka.classifiers.trees.J48:

    Use unpruned tree.
    Do not collapse tree.
-C <pruning confidence>
    Set confidence threshold for pruning.
    (default 0.25)
-M <minimum number of instances>
    Set minimum number of instances per leaf.
    (default 2)
    Use reduced error pruning.
-N <number of folds>
    Set number of folds for reduced error
    pruning. One fold is used as pruning set.
    (default 3)
    Use binary splits only.
    Don't perform subtree raising.
    Do not clean up after the tree has been built.
    Laplace smoothing for predicted probabilities.
    Do not use MDL correction for info gain on numeric attributes.
-Q <seed>
    Seed for random data shuffling (default 1).




核心包中的关键类:Attribute:包含attribute’s name, its type, and, in the case of a nominal or string attribute, its possible values

Instance:contains the attribute values of a particular instance

Instances:holds an ordered set of instances—in other words, a dataset



内容:contains implementations of most of the algorithms for clas-sification  and  numeric  prediction

关键抽象类:Classifier---->>defines the general structure of any  scheme  for  classification  or  numeric  prediction

包含三个核心方法:buildClassifier(), classifyInstance(),distributionForInstance()


  • weka.classifiers.trees.DecisionStump
  • 覆写了distributionForInstance()
  • 包含getRevision(),simply returns the revision number of the classifier,used  by  Weka  maintainers  when  diagnosing  and debugging  problems  reported  by  users.
  • 包含globalInfo(),returns  a  string describing  the  classifier,  which,  along  with  the  scheme’s  options
  • 包含toString(), returns a textual representation of the classifier
  • 包含toSource(),s used to obtain a source code repre-sentation  of  the  learned  classifier
  • 包含main(),called  when  you  ask  for a  decision  stump  from  the  command  line,相当于执行这个类的入口
  • 包含getCapabilities() ,called  by  the  generic  object  editor  to  provide information about the capabilities of a learning scheme




:contains association-rule  learners


:contains  methods  for  unsupervised  learning.包含非监督学习方法



weka.estimators package

:computes  different  types  of  probability  distribution




