Coder Social home page Coder Social logo

Comments (11)

JohnLangford avatar JohnLangford commented on May 28, 2024

This is essentially an unexpected behavior which I'm not sure how to
deal with.

Consider this sequence of commands:
cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model --save_resume -f
final.model
echo "" | vw --loss_function=logistic -i final.model -f finaler.model
cat set02 | vw --loss_function=logistic -i finaler.model -t|

average loss = 0.225275|

So, save_resume isn't doing anything bad to the state. Instead, it's
the accumulator which differs. A basic question when you run '-t' is:
do you want the average loss over just the test examples or the average
over the whole sequence? You are expecting the first, but it's
reporting the second.

What is the correct behavior?

-John

On 03/26/2014 08:49 AM, Martin Popel wrote:

When you want to train N subsequent models on N data sets, you must
use --save_resume flag in the first N-1 trainings, but you SHOULD NOT
use it in the last (N-th) training, if you want to get the same
results as when training on all the N data sets concatenated. John
Langford confirmed that "this looks bugly"
https://groups.yahoo.com/neo/groups/vowpal_wabbit/conversations/topics/3329.

I attach an example with N=2.
Not using --save_resume makes the final test loss (0.223781) only a
bit worse than the baseline (0.225275).
However, using --save_resume in both trainings makes the final test
loss much worse (0.267824).

|### prepare data train=set00,set01 test=set02
cd vowpal_wabbit/test/train-sets
split -dl 4000 rcv1_small.dat set

train on concatenated training sets

cat set00 set01 | vw --loss_function=logistic -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.225275

train separately, first with --save_resume, second without

cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.225275

train separately, both models with --save_resume

cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model --save_resume -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.267824

train separately, without --save_resume

cat set00 | vw --loss_function=logistic -f A.model
cat set01 | vw --loss_function=logistic -i A.model -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.223781

|
|


Reply to this email directly or view it on GitHub
#262.

from vowpal_wabbit.

vrilleup avatar vrilleup commented on May 28, 2024

We observed similar behavior recently. But it only happens when zero-weight examples got involved:

  1. train a vw model with --save_resume, and a dataset which contains some examples of zero weight. (either train the model in daemon mode or non-daemon mode)
  2. start vw with initial model (-i) obtained from step 1 and -t (--test_only), test some examples
  3. in the prediction of test examples, some prediction have value 50.000000, which is the upper bound set by vw internally for logistic_loss

note: this bug would not be triggered if --save_resume was not used in step 1, or --test_only was not used in step 2, or there were no examples of zero weights in step 1.

from vowpal_wabbit.

JohnLangford avatar JohnLangford commented on May 28, 2024

Is your train set the same as your test set here? Or different? If (b)
can you try (a)?

-John

On 04/01/2014 08:04 PM, vrilleup wrote:

We observed similar behavior recently. But it only happens when
zero-weight examples got involved:

  1. train a vw model with --save_resume, and a dataset which contains
    some examples of zero weight. (either train the model in daemon
    mode or non-daemon mode)
  2. start vw with initial model (-i) obtained from step 1 and -t
    (--test_only), test some examples
  3. in the prediction of test examples, some prediction have value
    50.000000, which is the upper bound set by vw internally for
    logistic_loss

note: this bug would not be triggered if --save_resume was not used in
step 1, or --test_only was not used in step 2, or there were no
examples of zero weights in step 1.


Reply to this email directly or view it on GitHub
#262 (comment).

from vowpal_wabbit.

vrilleup avatar vrilleup commented on May 28, 2024

I just tried both cases:
(a) same train and test set. Only the zero weight examples got prediction 50.000000, but not all zero weight eamples got 50.000000 (about 40% of zero weight examples, which depends on the feature distribution I guess).
(b) different train and test set. Prediction 50.000000 can be seen for both zero weight and nonzero weight examples (about 15% of all examples).

from vowpal_wabbit.

JohnLangford avatar JohnLangford commented on May 28, 2024

This is not obviously a bug. Particularly when using LBFGS and/or
multi-pass learning, the predictor can become extremely certain about some
predictions. This is why those thresholds are in there.

-John

On Wed, Apr 2, 2014 at 1:47 PM, Li Pu [email protected] wrote:

I just tried both cases:
(a) same train and test set. Only the zero weight examples got prediction
50.000000, but not all zero weight eamples got 50.000000 (about 40% of zero
weight examples, which depends on the feature distribution I guess).
(b) different train and test set. Prediction 50.000000 can be seen for
both zero weight and nonzero weight examples (about 15% of all examples).

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39361078
.

from vowpal_wabbit.

martinpopel avatar martinpopel commented on May 28, 2024

echo "" | vw --loss_function=logistic -i final.model -f finaler.model

OK, this is a clever workaround for this issue (but annoying with larger models).

A basic question when you run '-t' is:
do you want the average loss over just the test examples or the average
over the whole sequence?

Yes, I am expecting the first when using -t.
I think the second is counter-intuitive here.
I think save_resume's primary goal is to produce a model which behaves exactly the same as if trained in one step on all training sets.
(Of course, it must contain some extra info to allow one more training step.)

What is the correct behavior?

First, what are the use cases for save_resume?
A) training in more steps
B) testing in more steps
I've always used only A, but maybe someone needs B as well.

I don't care what loss is reported in the steps which use --save_resume and don't use -t (i.e. in the training steps).
Probably it should be the average loss over all examples in all training steps.

I don't care what loss is reported in the steps which use --save_resume and -t (i.e. use case B).
Probably it should be the average loss over all test steps.

However, I suggest to change the behavior when -t is used and --save_resume is not used.
In this case, only the loss of the current step should be reported, I think.

from vowpal_wabbit.

vrilleup avatar vrilleup commented on May 28, 2024

The bug behavior is that if the model was saved without --save_resume, all predictions seem to be normal (none of them close to 50.000000). It was an online training setting (single pass). This is the exact command line we use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n --keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

from vowpal_wabbit.

JohnLangford avatar JohnLangford commented on May 28, 2024

I changed the semantics of --save_resume so that when used with -t it
resets all accumulators This addresses Martin's unexpected usage.

For Li Pu: what happens if you turn off normalization via --adaptive
--invariant ?

-John

On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:

The bug behavior is that if the model was saved without --save_resume, all
predictions seem to be normal (none of them close to 50.000000). It was an
online training setting (single pass). This is the exact command line we
use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep
c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n
--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q
ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39393393
.

from vowpal_wabbit.

vrilleup avatar vrilleup commented on May 28, 2024

Hi John,
Thank you very much for your reply! I tried turning off normalization via --adaptive --invariant, and via --sgd. But still there are 50.000000 predictions in the result. What would be possible cause of this? I suspect there are some new features in the -t dataset, and these new features were not present in the --save_resume model.
Best,
Li

Date: Mon, 14 Apr 2014 12:17:44 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [vowpal_wabbit] save_resume makes results worse (#262)

I changed the semantics of --save_resume so that when used with -t it

resets all accumulators This addresses Martin's unexpected usage.

For Li Pu: what happens if you turn off normalization via --adaptive

--invariant ?

-John

On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:

The bug behavior is that if the model was saved without --save_resume, all

predictions seem to be normal (none of them close to 50.000000). It was an

online training setting (single pass). This is the exact command line we

use:

--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep

c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n

--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q

ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39393393

.


Reply to this email directly or view it on GitHub.

from vowpal_wabbit.

arielf avatar arielf commented on May 28, 2024

A prediction of 50 means that vw is very very certain that it is a positive label.
A prediction of -50 means that vw is very very certain that it is a negative label.

You may pipe these predictions into utl/logistic to map them to [-1, 1] range.

You may also use --max_prediction and --min_prediction for clipping but this may be inappropriate for your needs (may lose significant accuracy from range clipping).

from vowpal_wabbit.

JohnLangford avatar JohnLangford commented on May 28, 2024

A nonstationarity between train and test set could account for the
different behavior.

-John

On Tue, Apr 15, 2014 at 5:24 PM, Li Pu [email protected] wrote:

Hi John,
Thank you very much for your reply! I tried turning off normalization via
--adaptive --invariant, and via --sgd. But still there are 50.000000
predictions in the result. What would be possible cause of this? I suspect
there are some new features in the -t dataset, and these new features were
not present in the --save_resume model.
Best,
Li

Date: Mon, 14 Apr 2014 12:17:44 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [vowpal_wabbit] save_resume makes results worse (#262)

I changed the semantics of --save_resume so that when used with -t it

resets all accumulators This addresses Martin's unexpected usage.

For Li Pu: what happens if you turn off normalization via --adaptive

--invariant ?

-John

On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:

The bug behavior is that if the model was saved without --save_resume,
all

predictions seem to be normal (none of them close to 50.000000). It was
an

online training setting (single pass). This is the exact command line we

use:

--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off
--keep

c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep
n

--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w
-q

ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

Reply to this email directly or view it on GitHub<
https://github.com/JohnLangford/vowpal_wabbit/issues/262#issuecomment-39393393>

.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-40536448
.

from vowpal_wabbit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.