Fork me on GitHub

read later

https://groups.google.com/forum/#!msg/pylearn-users/FYtpaQKoC4c/ubitO_JUC1kJ

Google

发布回复

加星标的主题

收藏夹

最近看过的论坛

comp.ai.neural-nets

隐私权政策 - 服务条款

pylearn-users ›

Weights and Regularization 4 名作者发布了 29 个帖子

b_m...@live.com

13-10-10

将帖子翻译为中文

I am trying to experiment with pylearn2 directly using python (as I am unfamiliar with yaml). I am stealing much from a very nice blog post (http://www.arngarden.com/2013/07/29/neural-network-example-using-pylearn2/).

Here is the basic script I am using (I am using a data set from UCI regarding wine ratings).

####CODE#####################################################################

# create hidden layer with 5 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)

# create hidden layer with 2 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)

# create Softmax output layer
output_layer = mlp.Softmax(2, 'output', irange=.1)

# create Stochastic Gradient Descent trainer that runs for x epochs
trainer = sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200))
layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output

# create neural net that takes two inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, ds)

# train neural net until the termination criterion is true
while True:
trainer.train(dataset=ds)
ann.monitor.report_epoch()
ann.monitor()
if not trainer.continue_learning(ann):
break

####END CODE####################################################################

My questions:

I. Weights. How do I see the weights from the trained model? I *think* I am adding a second hidden layer above but if I looked at ann.get_weights() the dimension of this resulting object does not change if I remove the second hidden layer. So I question if I am looking at the right thing. Ultimately I want to see the finished weights so (outside pylearn) I can visualize the network.

II. Regularization. How to use regularization? Specifically, how to adjust the above code to use 1) drop out and then 2) L2 norm?

Thanks!

Brian

点击此处回复

此帖已被删除。

b_m...@live.com

13-10-12

将帖子翻译为中文

- 显示引用文字 -

Through the ann.get_param_values() call I am now able to see the weight and bias values and through knowledge of the net architecture, accomplish question #1.

I would still like to get some quick help on how to use regularization (especially dropout) and then how to predict new cases with such a model (ann.fprop(theano.shared(testMatrix, name='test')).eval() call still work?).

Thanks!

Kyle Kastner

13-10-12

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

I am doing something similar, and had to enable the one_hot=True to recreate the MNIST yaml results in python. What is the error you are getting?

Kyle

- 显示引用文字 -

- 显示引用文字 -

--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

b_m...@live.com

13-10-13

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

On Friday, October 11, 2013 11:39:02 PM UTC-4, Kyle Kastner wrote:
> I am doing something similar, and had to enable the one_hot=True to recreate the MNIST yaml results in python. What is the error you are getting?
>
> Kyle
>

.

Kyle,

I am not getting at error, instead I am looking to learn/confirm the proper method to train a MLP using regularization (L2 as well as dropout) and then get predictions on a new data set. I am not using yaml though i want a way to use pylearn2 directly in python using its functions.

I referenced a blog that showed how to train a MLP w/o regularization (only number of epochs) and then predict new data using ann.fprop where ann is the trained MLP. I *think* I can use drop out simply by adding the call into SGD like this:
sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200), cost=Dropout())

and then to predict new data I *think* i just need to call dropout_fprop instead of fprop. Like this (where X_s is the new test set).

test_preds=ann.dropout_fprop(theano.shared(X_s, name='test')).eval()

But I am hoping one of the developers will confirm this is correct and explain how to add a L2 penalty, as that is escaping me currently. I am not very experienced with Python yet so following the code is a challenge.

Kyle Kastner

13-10-14

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

I don't know that you need to call dropout_fprop for the predictions - once the network is trained, a regular fprop should be all you need, as the model averaging is done during training - the fprop output of a dropout net *should* represent the bagged estimate of many neural nets. I am having trouble finding the reference, but I am recalling that from somewhere. Maybe some one else can help/contradict me here?

In my code, I have called Dropout with an additional dictionary of parameters, so that the dropout from the visible layer is .8, while the others remain .5, as is recommended in some of the literature. The default value of .5 dropout should be OK though, so cost=Dropout() seems ok to me.

I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization?

There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link.

Kyle

- 显示引用文字 -

- 显示引用文字 -

--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

b_m...@live.com

13-10-14

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

On Sunday, October 13, 2013 7:38:25 PM UTC-4, Kyle Kastner wrote:
> I don't know that you need to call dropout_fprop for the predictions - once the network is trained, a regular fprop should be all you need, as the model averaging is done during training - the fprop output of a dropout net *should* represent the bagged estimate of many neural nets. I am having trouble finding the reference, but I am recalling that from somewhere. Maybe some one else can help/contradict me here?
>
>
>
>
>
> In my code, I have called Dropout with an additional dictionary of parameters, so that the dropout from the visible layer is .8, while the others remain .5, as is recommended in some of the literature. The default value of .5 dropout should be OK though, so cost=Dropout() seems ok to me.
>
>
> I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization?
>
> There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link.
>
>
>
>
> Kyle
>

Hey Kyle,

I. I see this description of dropout_fprop from models/mlp.py so I am not sure:

def dropout_fprop(self, state_below, default_input_include_prob=0.5,
input_include_probs=None, default_input_scale=2.,
input_scales=None, per_example=True):
"""
state_below: The input to the MLP

Returns the output of the MLP, when applying dropout to the input and intermediate layers.

II. regarding L2, I would not be using both, just want to see how to do it as another option.

I saw that class. I also am thinking that here https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py

there is this:

class WeightDecay(Cost):
"""
coeff * sum(sqr(weights))

for each set of weights.

"""

def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the squared L2 norm of the weights
for each layer.

and this

class L1WeightDecay(Cost):
"""
coeff * sum(abs(weights))

for each set of weights.

"""

def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the L1 norm of the
weights(lasso) for each layer.

which might be the way to go for L1 and L2 reg.

Kyle Kastner

13-10-14

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

As far as dropout_fprop goes, I think that description matchesmy thoughts. You use dropout_fprop during the training stage, to apply dropout at each layer, which *effectively* creates many separate neural networks, each trained on one example. Then, once the training is all done, you can use a regular fprop, which will *effectively* give you the bagged decision result from all of these networks, by making a decision using all of the weights (see http://arxiv.org/pdf/1207.0580.pdf)

In short, I think that dropout_fprop is largely internal/used during training - while fprop is used for predictions with a trained net.

I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know.

Kyle

- 显示引用文字 -

- 显示引用文字 -

--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

b_m...@live.com

13-10-14

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

You may be right, I knew in dropout that during test set prediction, all the weights were used, but were (in the Hinton original paper) divided by 2 to effectively replicate the 0.5 inclusion probability. So, perhaps if the weights of the trained model are such adjusted, then fprop works. Apparently this site is not monitored regularly by developers.

I could not figure out how to use the weightdecay class in my code (I am not using yaml). I tried this with no success
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=WeightDecay(coeffs=[0.005,0.005,0.005]))

With yaml are you able to make predictions on a new dataset (and get the probabilities and not just the predicted class).

On Monday, October 14, 2013 9:28:46 AM UTC-4, Kyle Kastner wrote:
> As far as dropout_fprop goes, I think that description matchesmy thoughts. You use dropout_fprop during the training stage, to apply dropout at each layer, which *effectively* creates many separate neural networks, each trained on one example. Then, once the training is all done, you can use a regular fprop, which will *effectively* give you the bagged decision result from all of these networks, by making a decision using all of the weights (see http://arxiv.org/pdf/1207.0580.pdf)
>
>
>
> In short, I think that dropout_fprop is largely internal/used during training - while fprop is used for predictions with a trained net.
>
>
> I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know.
>
>
> Kyle
>
>
>

Ian Goodfellow

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

You need to use a SumOfCosts class that adds together a Dropout cost
and a WeightDecay cost. If you train with only a WeightDecay cost it
will just make the weights go to 0.

You indeed need to use dropout_fprop at train time and regular fprop
at test time. Training using the Dropout cost will handle the calls to
dropout_fprop for you.

- 显示引用文字 -

- 显示引用文字 -

b_m...@live.com

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

On Monday, October 14, 2013 8:22:50 PM UTC-4, Ian Goodfellow wrote:
> You need to use a SumOfCosts class that adds together a Dropout cost
>
> and a WeightDecay cost. If you train with only a WeightDecay cost it
>
> will just make the weights go to 0.
>
>
>
> You indeed need to use dropout_fprop at train time and regular fprop
>
> at test time. Training using the Dropout cost will handle the calls to
>
> dropout_fprop for you.
>
>
>

Ian,

I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?

II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?

b_m...@live.com

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

>
>
> Ian,
>
> I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?
>
> II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?

I mean for I. that a user doesn't ever call dropout_fprop directly correct, just ass the cost=Dropout() call into sgd?

Ian Goodfellow

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

I. Yes. Yes to your follow up question 2.
II. Yes, the SumOfCosts class does the addition.

- 显示引用文字 -

- 显示引用文字 -

b_m...@live.com

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

On Monday, October 14, 2013 8:35:11 PM UTC-4, Ian Goodfellow wrote:
> I. Yes. Yes to your follow up question 2.
>
> II. Yes, the SumOfCosts class does the addition.
>

Thanks! Last follow-up, how would I actually accomplish II?

I tried this but receive an NotImplementedError.

trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))

Ian Goodfellow

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Can you post the full trace and error message?

- 显示引用文字 -

- 显示引用文字 -

b_m...@live.com

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

- 显示引用文字 -

Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Anaconda\lib\site-packages\pylearn2-0.1dev-py2.7.egg\pylearn2\training_algorithms\sgd.py", line 314, in train
"data_specs: %s" % str(data_specs))
NotImplementedError: Unable to train with SGD, because the cost does not actually use data from the data set. data_specs: (CompositeSpace(), ())

I can post the entire script (it is a simple 2 hidden layer mlp) if need be.

Ian Goodfellow

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Yeah, post the whole script I guess.

- 显示引用文字 -

b_m...@live.com

13-10-15

将帖子翻译为中文

Here is the code. The dataset is a small toy example. Not sure I can upload it here though....

import theano
from pylearn2.models import mlp
from pylearn2.training_algorithms import sgd
from pylearn2.termination_criteria import MonitorBased, EpochCounter
from pylearn2.costs.mlp.dropout import Dropout
from pylearn2.costs.cost import SumOfCosts, MethodCost
from pylearn2.models.mlp import WeightDecay, L1WeightDecay
from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
import numpy as np
from random import randint
from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Binarizer
import pandas as pd

X=np.loadtxt(open("C:\Users\Desktop\pylearn2\wine.csv"), delimiter=';',usecols=range(0, 11), skiprows=1) #first 11 cols
X=np.array(X)
y=np.loadtxt(open("C:\Users\Desktop\pylearn2\wine.csv"), delimiter=';',usecols=(11,12), skiprows=1)
y=np.array(y)

#train
X_t=X[:3000,:]
y_t=y[:3000,:]

#valid
X_v=X[2500:3000,:]
y_v=y[2500:3000,:]

#test
X_s=X[3000:,:]
y_s=y[3000:,:]

#center and scale inputs
scaler=StandardScaler()
scaler.fit(X_t)
X_t=scaler.transform(X_t)
X_v=scaler.transform(X_v)
X_s=scaler.transform(X_s)

class datMake(DenseDesignMatrix): #inherits from DenseDesignMatrix
def __init__(self,X,y):
super(datMake, self).__init__(X=X, y=y)

dt_train=datMake(X_t,y_t)
dt_valid=datMake(X_v,y_v)
dt_test=datMake(X_s,y_s)

# create hidden layer with 5 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)

# create hidden layer with 2 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)

# create Softmax output layer
output_layer = mlp.Softmax(2, 'output', irange=.1)

trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])])) #epoch is complete run through the data. if the training set is 2000 records and the batch size is 100, there are two batches in an epoch

layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output

# create neural net that takes 11 inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, dt_train)

# train neural net until the termination criterion is true
while True:

trainer.train(dataset=dt_train)

ann.monitor.report_epoch()
ann.monitor()
if not trainer.continue_learning(ann):
break

ann.get_params()
ann.get_param_values()

#predict the test set
test_preds=ann.fprop(theano.shared(X_s, name='test')).eval()

Ian Goodfellow

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Can you just make it a public file on Google drive or something?

- 显示引用文字 -

- 显示引用文字 -

b_m...@live.com

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

- 显示引用文字 -

I placed the file here: https://docs.google.com/file/d/0B9dsnio60wRoRHptdHlTZjk2RU0/edit?usp=sharing

thanks Ian!

Pascal Lamblin

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

- 显示引用文字 -

There is only "WeightDecay" in the SumOfCost. As Ian said, this would
simply put all weights to zero.

You need to have at least a cost that actually depends on the data, such
as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also
use a MethodCost to specify a method of the model to call, and use the
return expression as the cost.

--
Pascal

Ian Goodfellow

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Pascal is correct. I hadn't read closely enough.

- 显示引用文字 -

- 显示引用文字 -

Brian Miner

13-10-15

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Hi Pascal,

Can you give an example? How to change this:

trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))

to incorporate the standard loss function used by the output layer?

Thanks!

- 显示引用文字 -

Ian Goodfellow

13-10-16

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Just put a second cost in the list. Like Dropout() or something.

- 显示引用文字 -

- 显示引用文字 -

b_m...@live.com

13-10-16

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

On Tuesday, October 15, 2013 1:02:13 PM UTC-4, Ian Goodfellow wrote:
> Just put a second cost in the list. Like Dropout() or something.
>
>

What i am struggling with and perhaps just did not explain well enough is how to add the weight decay to the default cost that results from a call to sgd without the cost parameter added at all. I don't want to combine weight decay with dropout. I want the output layer to dictate the cost, to which to add the weight decay term.

For example, this call
sgd.SGD(learning_rate=0.005,batch_size=100,termination_criterion=EpochCounter(5000))

has some default cost. I expect it is the negative log lik derived from the choice of output layer.

So, my question is simply what to do to add this cost to the weight decay (within SumOfCosts). There is a NegativeLogLikelihood in supervised_cost but that seems to be depreciated.

Thanks for the time!

Ian Goodfellow

13-10-16

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Dropout does use the last layer to drive the base cost:
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/dropout.py#L62
All it does is compute that cost with the hidden states multiplied by
2 * dropout mask.

If you don't want dropout, then use costs.mlp.Default:
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py#L11
That will also make the last layer drive the cost.

Most of the layers implement some kind of negative log likelihood as their cost.

The NegativeLogLikelihood cost has been deprecated because it's only
the negative log likelihood for a specific model (maybe softmax? I
haven't looked at it recently) so it doesn't make sense to apply it to
other models.

- 显示引用文字 -

- 显示引用文字 -

Pascal Lamblin

13-10-16

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

Hi Brian,

On Tue, Oct 15, 2013, Brian Miner wrote:
> Can you give an example? How to change this:
>
> trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
>
> to incorporate the standard loss function used by the output layer?

For an MLP, I think you can use:

SumOfCosts(costs=[
MethodCost("cost_from_X"),
WeightDecay(coeffs=[...])])

MethodCost is defined in costs/cost.py

>
> Thanks!
>
>
>
>
>
> On 10/15/2013 10:44 AM, Pascal Lamblin wrote:
> >On Mon, Oct 14, 2013, b_m...@live.com wrote:
> >>On Monday, October 14, 2013 8:35:11 PM UTC-4, Ian Goodfellow wrote:
> >>>I. Yes. Yes to your follow up question 2.
> >>>
> >>>II. Yes, the SumOfCosts class does the addition.
> >>Thanks! Last follow-up, how would I actually accomplish II?
> >>
> >>I tried this but receive an NotImplementedError.
> >>
> >>trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
> >There is only "WeightDecay" in the SumOfCost. As Ian said, this would
> >simply put all weights to zero.
> >
> >You need to have at least a cost that actually depends on the data, such
> >as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also
> >use a MethodCost to specify a method of the model to call, and use the
> >return expression as the cost.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "pylearn-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

--
Pascal

Ian Goodfellow

13-10-16

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

MethodCost works too. costs.mlp.Default should do exactly the same
thing, without needing to write cost_from_X in the base script.

- 显示引用文字 -

Brian

13-10-16

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

- 显示引用文字 -

Thank you Ian and Pascal! These did the trick. Is "cost_from_X" another
way of using the default (outer-layer dependent) cost function (without
assuming MLP)?

Ian Goodfellow

13-10-16

Re: [pylearn-users] Re: Weights and Regularization

将帖子翻译为中文

cost_from_X is the method that Default calls. MethodCost is a cost
based on calling a method that you name, so if you use MethodCost and
tell it to call cost_from_X it does the same thing as Default.

- 显示引用文字 -

posted on 2014-05-07 22:54 huashiyiqike 阅读(462) 评论(0) 编辑收藏举报

刷新页面返回顶部

公告