[Scikit-learn] 1.1 Generalized Linear Models - Comparing various online solvers

数据集分割

一、Online learning for 手写识别

From: Comparing various online solvers

An example showing how different online solvers perform on the hand-written digits dataset.

Ref: 在线机器学习算法及其伪代码

PA, CW, AROW, NHerd都是 Jubatus分布式在线机器学习框架能提供的算法。

感知器：linear_model.Perceptron

多重感知器：Multi-layer Perceptron

被动攻击算法：Passive Aggressive Perceptron

修正权值时，增加了一个参数Tt，预测正确时，不需要调整权值，预测错误时，主动调整权值。并可以加入松弛变量的概念，形成其算法的变种。

优点：能减少错误分类的数目，而且适用于不可分的噪声情况。

Tt 有三种计算方法：

a. Tt = lt / (||Xt||^2)

b. Tt = min{C, lt / ||Xt||^2}

c. Tt = lt / (||Xt||^2 + 1/(2C))

分别对应PA, PA-I, PA-II 算法，三种类型。

二、代码演示

In [58]:

# Author: Rob Zinkov <rob at zinkov dot com>
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

# from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier, Perceptron
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.linear_model import LogisticRegression

准备数据¶

In [59]:

digits  = datasets.load_digits()
X, y    = digits.data, digits.target

print(X)
print(X.shape)
print()
print(y)
print(y.shape)

[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 ...
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]
(1797, 64)

[0 1 2 ... 8 9 8]
(1797,)

分类器¶

In [60]:

classifiers = [
    ("SGD",                   SGDClassifier()),
    ("ASGD",                  SGDClassifier(average=True)),
    ("Perceptron",            Perceptron()),
    ("Passive-Aggressive I",  PassiveAggressiveClassifier(loss='hinge', C=1.0)),
    ("Passive-Aggressive II", PassiveAggressiveClassifier(loss='squared_hinge', C=1.0)),
    ("SAG",                   LogisticRegression(solver='sag', tol=1e-1, C=1.e4 / X.shape[0], multi_class='auto'))
]

训练并测试¶

In [61]:

# 每个数据集分割测试20次求 average performance
rounds  = 20
# test size
heldout = [0.95, 0.90, 0.75, 0.50, 0.01]

xx = 1. - np.array(heldout)
print(xx)

[0.05 0.1  0.25 0.5  0.99]

In [62]:

for name, clf in classifiers:
    print("training %s" % name)
    rng = np.random.RandomState(42)
    yy = []

    for i in heldout:
        yy_ = []
        for r in range(rounds):
            # data, train, predict
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=i, random_state=rng)
            clf.fit(X_train, y_train)
            y_pred = clf.predict(X_test)
            
            # try 20 times and calculate ave value.
            yy_.append(1 - np.mean(y_pred == y_test))
            
        yy.append(np.mean(yy_))
        
    plt.plot(xx, yy, label=name)

plt.legend(loc="upper right")
plt.xlabel("Proportion train")
plt.ylabel("Test Error Rate")
plt.show()

training SGD
training ASGD
training Perceptron
training Passive-Aggressive I
training Passive-Aggressive II
training SAG

In [ ]:

minibatch 批训练

一、是什么为什么

Ref: 为什么在深度学习里面需要mini-batch？

对于一般的BP网络来说，当没有mini-batch的时候，w、b 的每一次update都是整个数据集gradient的方向，而整个大数据集的数据分布是不会随着update的次数而改变的，它不存在随机性，这样容易是的在训练的时候 “才跑两步” cost function就卡住了，就很容易卡在saddle point或者说是local minimum 里面；如果加上mini-batch，则每次update的时候，数据集都不尽相同，都会存在一些随机性（上次update的时候是saddle point，但在下次update的时候可能就不是了），因此如果saddle point不是很麻烦的saddle point或local minimum不是很深的local minimum的话，整个模型是相对很容易跑出来的。

From: Out-of-core classification of text documents

This is an example showing how scikit-learn can be used for classification using an out-of-core approach: learning from data that doesn’t fit into main memory.

We make use of an online classifier, i.e., one that supports the partial_fit method, that will be fed with batches of examples.

To guarantee that the features space remains the same over time we leverage a HashingVectorizer that will project each example into the same feature space. This is especially useful in the case of text classification where new features (words) may appear in each batch.

The dataset used in this example is Reuters-21578 as provided by the UCI ML repository. It will be automatically downloaded and uncompressed on first run.

The plot represents the learning curve of the classifier: the evolution of classification accuracy over the course of the mini-batches. Accuracy is measured on the first 1000 samples, held out as a validation set.

To limit the memory consumption, we queue examples up to a fixed amount before feeding them to the learner. 一点一点地来报到，参加 training。

二、运行结果

runfile('/home/unsw/Programmer/1-python/Scikit-learn/1.9_code/plot_out_of_core_classification.py', wdir='/home/unsw/Programmer/1-python/Scikit-learn/1.9_code')
downloading dataset (once and for all) into /home/unsw/scikit_learn_data/reuters
untarring Reuters dataset...
done.
Test set is 975 documents (114 positive)
  Passive-Aggressive classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.911 in 1.16s (  804 docs/s)
          Perceptron classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.928 in 1.16s (  802 docs/s)
                 SGD classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.924 in 1.16s (  800 docs/s)
      NB Multinomial classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.884 in 1.19s (  784 docs/s)


  Passive-Aggressive classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.964 in 3.03s (  939 docs/s)
          Perceptron classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.923 in 3.04s (  939 docs/s)
                 SGD classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.957 in 3.04s (  937 docs/s)
      NB Multinomial classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.892 in 3.06s (  930 docs/s)


  Passive-Aggressive classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.947 in 5.18s ( 1115 docs/s)
          Perceptron classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.929 in 5.18s ( 1114 docs/s)
                 SGD classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.962 in 5.19s ( 1114 docs/s)
      NB Multinomial classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.911 in 5.21s ( 1109 docs/s)


  Passive-Aggressive classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.972 in 7.32s ( 1188 docs/s)
          Perceptron classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.961 in 7.32s ( 1187 docs/s)
                 SGD classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.964 in 7.33s ( 1187 docs/s)
      NB Multinomial classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.926 in 7.35s ( 1183 docs/s)


  Passive-Aggressive classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.970 in 9.44s ( 1219 docs/s)
          Perceptron classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.958 in 9.45s ( 1219 docs/s)
                 SGD classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.973 in 9.45s ( 1218 docs/s)
      NB Multinomial classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.929 in 9.47s ( 1215 docs/s)


  Passive-Aggressive classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.970 in 11.61s ( 1238 docs/s)
          Perceptron classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.872 in 11.61s ( 1237 docs/s)
                 SGD classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.972 in 11.62s ( 1237 docs/s)
      NB Multinomial classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.937 in 11.64s ( 1235 docs/s)


  Passive-Aggressive classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.969 in 13.85s ( 1249 docs/s)
          Perceptron classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.964 in 13.85s ( 1249 docs/s)
                 SGD classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.976 in 13.86s ( 1249 docs/s)
      NB Multinomial classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.939 in 13.88s ( 1247 docs/s)

三、minibatch 训练

# Here are some classifiers that support the `partial_fit` method
partial_fit_classifiers = {

    'SGD': SGDClassifier(),
    'Perceptron': Perceptron(),
    'NB Multinomial': MultinomialNB(alpha=0.01),
    'Passive-Aggressive': PassiveAggressiveClassifier(),
}



# We will feed the classifier with mini-batches of 1000 documents; this means
# we have at most 1000 docs in memory at any time.  The smaller the document
# batch, the bigger the relative overhead of the partial fit methods.
minibatch_size = 1000

# Create the data_stream that parses Reuters SGML files and iterates on
# documents as a stream.
minibatch_iterators = iter_minibatches(data_stream, minibatch_size)
total_vect_time = 0.0


# Main loop : iterate on mini-batches of examples
for i, (X_train_text, y_train) in enumerate(minibatch_iterators):

    tick = time.time()
    X_train = vectorizer.transform(X_train_text)
    total_vect_time += time.time() - tick

    # 来一批，大家各自训练一次；再来一批，大家各自再训练一次
    for cls_name, cls in partial_fit_classifiers.items():
        tick = time.time()
        # update estimator with examples in the current mini-batch
        cls.partial_fit(X_train, y_train, classes=all_classes)

        # accumulate test accuracy stats
        cls_stats[cls_name]['total_fit_time'] += time.time() - tick
        cls_stats[cls_name]['n_train']        += X_train.shape[0]
        cls_stats[cls_name]['n_train_pos']    += sum(y_train)

        tick = time.time()
        cls_stats[cls_name]['accuracy']        = cls.score(X_test, y_test)
        cls_stats[cls_name]['prediction_time'] = time.time() - tick

        acc_history = (cls_stats[cls_name]['accuracy'], cls_stats[cls_name]['n_train'])
        cls_stats[cls_name]['accuracy_history'].append(acc_history)

        run_history = (cls_stats[cls_name]['accuracy'], total_vect_time + cls_stats[cls_name]['total_fit_time'])
        cls_stats[cls_name]['runtime_history'].append(run_history)

        if i % 3 == 0:
            print(progress(cls_name, cls_stats[cls_name]))

    if i % 3 == 0:
        print('\n')

End.

posted @ 2017-07-10 16:33 郝壹贰叁阅读(537) 评论(0) 编辑收藏举报

刷新页面返回顶部

机器学习水很深

We all have two lives. The second one starts when we realize that we only have one. --- Tom Hiddleston