网上论坛
发布回复
|
Weights and Regularization
4 名作者发布了 29 个帖子
|
b_m...@live.com |
13-10-10
|
Here is the basic script I am using (I am using a data set from UCI regarding wine ratings).
####CODE#####################################################################
# create hidden layer with 5 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)
# create hidden layer with 2 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)
# create Softmax output layer
output_layer = mlp.Softmax(2, 'output', irange=.1)
# create Stochastic Gradient Descent trainer that runs for x epochs
trainer = sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200))
layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output
# create neural net that takes two inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, ds)
# train neural net until the termination criterion is true
while True:
trainer.train(dataset=ds)
ann.monitor.report_epoch()
ann.monitor()
if not trainer.continue_learning(ann):
break
####END CODE####################################################################
My questions:
I. Weights. How do I see the weights from the trained model? I *think* I am adding a second hidden layer above but if I looked at ann.get_weights() the dimension of this resulting object does not change if I remove the second hidden layer. So I question if I am looking at the right thing. Ultimately I want to see the finished weights so (outside pylearn) I can visualize the network.
II. Regularization. How to use regularization? Specifically, how to adjust the above code to use 1) drop out and then 2) L2 norm?
Thanks!
Brian
b_m...@live.com |
13-10-12
|
Through the ann.get_param_values() call I am now able to see the weight and bias values and through knowledge of the net architecture, accomplish question #1.
I would still like to get some quick help on how to use regularization (especially dropout) and then how to predict new cases with such a model (ann.fprop(theano.shared(testMatrix, name='test')).eval() call still work?).
Thanks!
Kyle Kastner |
13-10-12
|
- 显示引用文字 -
--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com |
13-10-13
|
> I am doing something similar, and had to enable the one_hot=True to recreate the MNIST yaml results in python. What is the error you are getting?
>
> Kyle
>
Kyle,
I am not getting at error, instead I am looking to learn/confirm the proper method to train a MLP using regularization (L2 as well as dropout) and then get predictions on a new data set. I am not using yaml though i want a way to use pylearn2 directly in python using its functions.
I referenced a blog that showed how to train a MLP w/o regularization (only number of epochs) and then predict new data using ann.fprop where ann is the trained MLP. I *think* I can use drop out simply by adding the call into SGD like this:
sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200), cost=Dropout())
and then to predict new data I *think* i just need to call dropout_fprop instead of fprop. Like this (where X_s is the new test set).
test_preds=ann.dropout_fprop(theano.shared(X_s, name='test')).eval()
But I am hoping one of the developers will confirm this is correct and explain how to add a L2 penalty, as that is escaping me currently. I am not very experienced with Python yet so following the code is a challenge.
Kyle Kastner |
13-10-14
|
I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization?
There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link.
- 显示引用文字 -
--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com |
13-10-14
|
> I don't know that you need to call dropout_fprop for the predictions - once the network is trained, a regular fprop should be all you need, as the model averaging is done during training - the fprop output of a dropout net *should* represent the bagged estimate of many neural nets. I am having trouble finding the reference, but I am recalling that from somewhere. Maybe some one else can help/contradict me here?
>
>
>
>
>
> In my code, I have called Dropout with an additional dictionary of parameters, so that the dropout from the visible layer is .8, while the others remain .5, as is recommended in some of the literature. The default value of .5 dropout should be OK though, so cost=Dropout() seems ok to me.
>
>
> I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization?
>
> There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link.
>
>
>
>
> Kyle
>
Hey Kyle,
I. I see this description of dropout_fprop from models/mlp.py so I am not sure:
def dropout_fprop(self, state_below, default_input_include_prob=0.5,
input_include_probs=None, default_input_scale=2.,
input_scales=None, per_example=True):
"""
state_below: The input to the MLP
Returns the output of the MLP, when applying dropout to the input and intermediate layers.
II. regarding L2, I would not be using both, just want to see how to do it as another option.
I saw that class. I also am thinking that here https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py
there is this:
class WeightDecay(Cost):
"""
coeff * sum(sqr(weights))
for each set of weights.
"""
def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the squared L2 norm of the weights
for each layer.
and this
class L1WeightDecay(Cost):
"""
coeff * sum(abs(weights))
for each set of weights.
"""
def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the L1 norm of the
weights(lasso) for each layer.
which might be the way to go for L1 and L2 reg.
Kyle Kastner |
13-10-14
|
I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know.
- 显示引用文字 -
--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com |
13-10-14
|
I could not figure out how to use the weightdecay class in my code (I am not using yaml). I tried this with no success
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=WeightDecay(coeffs=[0.005,0.005,0.005]))
With yaml are you able to make predictions on a new dataset (and get the probabilities and not just the predicted class).
On Monday, October 14, 2013 9:28:46 AM UTC-4, Kyle Kastner wrote:
> As far as dropout_fprop goes, I think that description matchesmy thoughts. You use dropout_fprop during the training stage, to apply dropout at each layer, which *effectively* creates many separate neural networks, each trained on one example. Then, once the training is all done, you can use a regular fprop, which will *effectively* give you the bagged decision result from all of these networks, by making a decision using all of the weights (see http://arxiv.org/pdf/1207.0580.pdf)
>
>
>
> In short, I think that dropout_fprop is largely internal/used during training - while fprop is used for predictions with a trained net.
>
>
> I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know.
>
>
> Kyle
>
>
>
Ian Goodfellow |
13-10-15
|
and a WeightDecay cost. If you train with only a WeightDecay cost it
will just make the weights go to 0.
You indeed need to use dropout_fprop at train time and regular fprop
at test time. Training using the Dropout cost will handle the calls to
dropout_fprop for you.
b_m...@live.com |
13-10-15
|
> You need to use a SumOfCosts class that adds together a Dropout cost
>
> and a WeightDecay cost. If you train with only a WeightDecay cost it
>
> will just make the weights go to 0.
>
>
>
> You indeed need to use dropout_fprop at train time and regular fprop
>
> at test time. Training using the Dropout cost will handle the calls to
>
> dropout_fprop for you.
>
>
>
Ian,
I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?
II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?
b_m...@live.com |
13-10-15
|
>
>
> Ian,
>
> I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?
>
> II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?
I mean for I. that a user doesn't ever call dropout_fprop directly correct, just ass the cost=Dropout() call into sgd?
Ian Goodfellow |
13-10-15
|
II. Yes, the SumOfCosts class does the addition.
b_m...@live.com |
13-10-15
|
> I. Yes. Yes to your follow up question 2.
>
> II. Yes, the SumOfCosts class does the addition.
>
Thanks! Last follow-up, how would I actually accomplish II?
I tried this but receive an NotImplementedError.
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
Ian Goodfellow |
13-10-15
|
b_m...@live.com |
13-10-15
|
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Anaconda\lib\site-packages\pylearn2-0.1dev-py2.7.egg\pylearn2\training_algorithms\sgd.py", line 314, in train
"data_specs: %s" % str(data_specs))
NotImplementedError: Unable to train with SGD, because the cost does not actually use data from the data set. data_specs: (CompositeSpace(), ())
I can post the entire script (it is a simple 2 hidden layer mlp) if need be.
Ian Goodfellow |
13-10-15
|
b_m...@live.com |
13-10-15
|
import theano
from pylearn2.models import mlp
from pylearn2.training_algorithms import sgd
from pylearn2.termination_criteria import MonitorBased, EpochCounter
from pylearn2.costs.mlp.dropout import Dropout
from pylearn2.costs.cost import SumOfCosts, MethodCost
from pylearn2.models.mlp import WeightDecay, L1WeightDecay
from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
import numpy as np
from random import randint
from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Binarizer
import pandas as pd
X=np.loadtxt(open("C:\Users\Desktop\pylearn2\wine.csv"), delimiter=';',usecols=range(0, 11), skiprows=1) #first 11 cols
X=np.array(X)
y=np.loadtxt(open("C:\Users\Desktop\pylearn2\wine.csv"), delimiter=';',usecols=(11,12), skiprows=1)
y=np.array(y)
#train
X_t=X[:3000,:]
y_t=y[:3000,:]
#valid
X_v=X[2500:3000,:]
y_v=y[2500:3000,:]
#test
X_s=X[3000:,:]
y_s=y[3000:,:]
#center and scale inputs
scaler=StandardScaler()
scaler.fit(X_t)
X_t=scaler.transform(X_t)
X_v=scaler.transform(X_v)
X_s=scaler.transform(X_s)
class datMake(DenseDesignMatrix): #inherits from DenseDesignMatrix
def __init__(self,X,y):
super(datMake, self).__init__(X=X, y=y)
dt_train=datMake(X_t,y_t)
dt_valid=datMake(X_v,y_v)
dt_test=datMake(X_s,y_s)
# a bias with value 1
hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)
# a bias with value 1
hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)
output_layer = mlp.Softmax(2, 'output', irange=.1)
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])])) #epoch is complete run through the data. if the training set is 2000 records and the batch size is 100, there are two batches in an epoch
# create neural net that takes 11 inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, dt_train)
while True:
ann.monitor.report_epoch()
ann.monitor()
if not trainer.continue_learning(ann):
break
ann.get_params()
ann.get_param_values()
#predict the test set
test_preds=ann.fprop(theano.shared(X_s, name='test')).eval()
Ian Goodfellow |
13-10-15
|
b_m...@live.com |
13-10-15
|
I placed the file here: https://docs.google.com/file/d/0B9dsnio60wRoRHptdHlTZjk2RU0/edit?usp=sharing
thanks Ian!
Pascal Lamblin |
13-10-15
|
simply put all weights to zero.
You need to have at least a cost that actually depends on the data, such
as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also
use a MethodCost to specify a method of the model to call, and use the
return expression as the cost.
--
Pascal
Ian Goodfellow |
13-10-15
|
Brian Miner |
13-10-15
|
Can you give an example? How to change this:
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
Thanks!
Ian Goodfellow |
13-10-16
|
b_m...@live.com |
13-10-16
|
> Just put a second cost in the list. Like Dropout() or something.
>
>
What i am struggling with and perhaps just did not explain well enough is how to add the weight decay to the default cost that results from a call to sgd without the cost parameter added at all. I don't want to combine weight decay with dropout. I want the output layer to dictate the cost, to which to add the weight decay term.
For example, this call
sgd.SGD(learning_rate=0.005,batch_size=100,termination_criterion=EpochCounter(5000))
has some default cost. I expect it is the negative log lik derived from the choice of output layer.
So, my question is simply what to do to add this cost to the weight decay (within SumOfCosts). There is a NegativeLogLikelihood in supervised_cost but that seems to be depreciated.
Thanks for the time!
Ian Goodfellow |
13-10-16
|
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/dropout.py#L62
All it does is compute that cost with the hidden states multiplied by
2 * dropout mask.
If you don't want dropout, then use costs.mlp.Default:
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py#L11
That will also make the last layer drive the cost.
Most of the layers implement some kind of negative log likelihood as their cost.
The NegativeLogLikelihood cost has been deprecated because it's only
the negative log likelihood for a specific model (maybe softmax? I
haven't looked at it recently) so it doesn't make sense to apply it to
other models.
Pascal Lamblin |
13-10-16
|
On Tue, Oct 15, 2013, Brian Miner wrote:
> Can you give an example? How to change this:
>
> trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
>
> to incorporate the standard loss function used by the output layer?
For an MLP, I think you can use:
SumOfCosts(costs=[
MethodCost("cost_from_X"),
WeightDecay(coeffs=[...])])
MethodCost is defined in costs/cost.py
>
> Thanks!
>
>
>
>
>
> On 10/15/2013 10:44 AM, Pascal Lamblin wrote:
> >On Mon, Oct 14, 2013, b_m...@live.com wrote:
> >>On Monday, October 14, 2013 8:35:11 PM UTC-4, Ian Goodfellow wrote:
> >>>I. Yes. Yes to your follow up question 2.
> >>>
> >>>II. Yes, the SumOfCosts class does the addition.
> >>Thanks! Last follow-up, how would I actually accomplish II?
> >>
> >>I tried this but receive an NotImplementedError.
> >>
> >>trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
> >There is only "WeightDecay" in the SumOfCost. As Ian said, this would
> >simply put all weights to zero.
> >
> >You need to have at least a cost that actually depends on the data, such
> >as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also
> >use a MethodCost to specify a method of the model to call, and use the
> >return expression as the cost.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "pylearn-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
--
Pascal
Ian Goodfellow |
13-10-16
|
thing, without needing to write cost_from_X in the base script.
Brian |
13-10-16
|
way of using the default (outer-layer dependent) cost function (without
assuming MLP)?
Ian Goodfellow |
13-10-16
|
based on calling a method that you name, so if you use MethodCost and
tell it to call cost_from_X it does the same thing as Default.