save_resume makes results worse about vowpal_wabbit HOT 11 CLOSED

vowpalwabbit commented on May 28, 2024

save_resume makes results worse

from vowpal_wabbit.

Comments (11)

JohnLangford commented on May 28, 2024

This is essentially an unexpected behavior which I'm not sure how to
deal with.

Consider this sequence of commands:
cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model --save_resume -f
final.model
echo "" | vw --loss_function=logistic -i final.model -f finaler.model
cat set02 | vw --loss_function=logistic -i finaler.model -t|

average loss = 0.225275|

So, save_resume isn't doing anything bad to the state. Instead, it's
the accumulator which differs. A basic question when you run '-t' is:
do you want the average loss over just the test examples or the average
over the whole sequence? You are expecting the first, but it's
reporting the second.

What is the correct behavior?

-John

On 03/26/2014 08:49 AM, Martin Popel wrote:

When you want to train N subsequent models on N data sets, you must
use --save_resume flag in the first N-1 trainings, but you SHOULD NOT
use it in the last (N-th) training, if you want to get the same
results as when training on all the N data sets concatenated. John
Langford confirmed that "this looks bugly"
https://groups.yahoo.com/neo/groups/vowpal_wabbit/conversations/topics/3329.

I attach an example with N=2.
Not using --save_resume makes the final test loss (0.223781) only a
bit worse than the baseline (0.225275).
However, using --save_resume in both trainings makes the final test
loss much worse (0.267824).

|### prepare data train=set00,set01 test=set02
cd vowpal_wabbit/test/train-sets
split -dl 4000 rcv1_small.dat set

train on concatenated training sets

cat set00 set01 | vw --loss_function=logistic -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.225275

train separately, first with --save_resume, second without

cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.225275

train separately, both models with --save_resume

cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model --save_resume -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.267824

train separately, without --save_resume

cat set00 | vw --loss_function=logistic -f A.model
cat set01 | vw --loss_function=logistic -i A.model -f final.model
cat set02 | vw --loss_function=logistic -i final.model -t

average loss = 0.223781

|
|

—
Reply to this email directly or view it on GitHub
#262.

from vowpal_wabbit.

vrilleup commented on May 28, 2024

We observed similar behavior recently. But it only happens when zero-weight examples got involved:

train a vw model with --save_resume, and a dataset which contains some examples of zero weight. (either train the model in daemon mode or non-daemon mode)
start vw with initial model (-i) obtained from step 1 and -t (--test_only), test some examples
in the prediction of test examples, some prediction have value 50.000000, which is the upper bound set by vw internally for logistic_loss

note: this bug would not be triggered if --save_resume was not used in step 1, or --test_only was not used in step 2, or there were no examples of zero weights in step 1.

from vowpal_wabbit.

JohnLangford commented on May 28, 2024

Is your train set the same as your test set here? Or different? If (b)
can you try (a)?

-John

On 04/01/2014 08:04 PM, vrilleup wrote:

We observed similar behavior recently. But it only happens when
zero-weight examples got involved:

train a vw model with --save_resume, and a dataset which contains
some examples of zero weight. (either train the model in daemon
mode or non-daemon mode)

start vw with initial model (-i) obtained from step 1 and -t
(--test_only), test some examples

in the prediction of test examples, some prediction have value
50.000000, which is the upper bound set by vw internally for
logistic_loss

note: this bug would not be triggered if --save_resume was not used in
step 1, or --test_only was not used in step 2, or there were no
examples of zero weights in step 1.

—
Reply to this email directly or view it on GitHub
#262 (comment).

from vowpal_wabbit.

vrilleup commented on May 28, 2024

I just tried both cases:
(a) same train and test set. Only the zero weight examples got prediction 50.000000, but not all zero weight eamples got 50.000000 (about 40% of zero weight examples, which depends on the feature distribution I guess).
(b) different train and test set. Prediction 50.000000 can be seen for both zero weight and nonzero weight examples (about 15% of all examples).

from vowpal_wabbit.

JohnLangford commented on May 28, 2024

This is not obviously a bug. Particularly when using LBFGS and/or
multi-pass learning, the predictor can become extremely certain about some
predictions. This is why those thresholds are in there.

-John

On Wed, Apr 2, 2014 at 1:47 PM, Li Pu [email protected] wrote:

I just tried both cases:
(a) same train and test set. Only the zero weight examples got prediction
50.000000, but not all zero weight eamples got 50.000000 (about 40% of zero
weight examples, which depends on the feature distribution I guess).
(b) different train and test set. Prediction 50.000000 can be seen for
both zero weight and nonzero weight examples (about 15% of all examples).

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39361078
.

from vowpal_wabbit.

martinpopel commented on May 28, 2024

echo "" | vw --loss_function=logistic -i final.model -f finaler.model

OK, this is a clever workaround for this issue (but annoying with larger models).

A basic question when you run '-t' is:
do you want the average loss over just the test examples or the average
over the whole sequence?

Yes, I am expecting the first when using -t.
I think the second is counter-intuitive here.
I think save_resume's primary goal is to produce a model which behaves exactly the same as if trained in one step on all training sets.
(Of course, it must contain some extra info to allow one more training step.)

What is the correct behavior?

First, what are the use cases for save_resume?
A) training in more steps
B) testing in more steps
I've always used only A, but maybe someone needs B as well.

I don't care what loss is reported in the steps which use --save_resume and don't use -t (i.e. in the training steps).
Probably it should be the average loss over all examples in all training steps.

I don't care what loss is reported in the steps which use --save_resume and -t (i.e. use case B).
Probably it should be the average loss over all test steps.

However, I suggest to change the behavior when -t is used and --save_resume is not used.
In this case, only the loss of the current step should be reported, I think.

from vowpal_wabbit.

vrilleup commented on May 28, 2024

The bug behavior is that if the model was saved without --save_resume, all predictions seem to be normal (none of them close to 50.000000). It was an online training setting (single pass). This is the exact command line we use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n --keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

from vowpal_wabbit.

JohnLangford commented on May 28, 2024

I changed the semantics of --save_resume so that when used with -t it
resets all accumulators This addresses Martin's unexpected usage.

For Li Pu: what happens if you turn off normalization via --adaptive
--invariant ?

-John

On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:

The bug behavior is that if the model was saved without --save_resume, all
predictions seem to be normal (none of them close to 50.000000). It was an
online training setting (single pass). This is the exact command line we
use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep
c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n
--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q
ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39393393
.

from vowpal_wabbit.

vrilleup commented on May 28, 2024

Hi John,
Thank you very much for your reply! I tried turning off normalization via --adaptive --invariant, and via --sgd. But still there are 50.000000 predictions in the result. What would be possible cause of this? I suspect there are some new features in the -t dataset, and these new features were not present in the --save_resume model.
Best,
Li

Date: Mon, 14 Apr 2014 12:17:44 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [vowpal_wabbit] save_resume makes results worse (#262)

I changed the semantics of --save_resume so that when used with -t it

resets all accumulators This addresses Martin's unexpected usage.

For Li Pu: what happens if you turn off normalization via --adaptive

--invariant ?

-John

On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:

The bug behavior is that if the model was saved without --save_resume, all

predictions seem to be normal (none of them close to 50.000000). It was an

online training setting (single pass). This is the exact command line we

use:

--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep

c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n

--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q

ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39393393

.

—
Reply to this email directly or view it on GitHub.

from vowpal_wabbit.

arielf commented on May 28, 2024

A prediction of 50 means that vw is very very certain that it is a positive label.
A prediction of -50 means that vw is very very certain that it is a negative label.

You may pipe these predictions into utl/logistic to map them to [-1, 1] range.

You may also use --max_prediction and --min_prediction for clipping but this may be inappropriate for your needs (may lose significant accuracy from range clipping).

from vowpal_wabbit.

JohnLangford commented on May 28, 2024

A nonstationarity between train and test set could account for the
different behavior.

-John

On Tue, Apr 15, 2014 at 5:24 PM, Li Pu [email protected] wrote:

Hi John,
Thank you very much for your reply! I tried turning off normalization via
--adaptive --invariant, and via --sgd. But still there are 50.000000
predictions in the result. What would be possible cause of this? I suspect
there are some new features in the -t dataset, and these new features were
not present in the --save_resume model.
Best,
Li

Date: Mon, 14 Apr 2014 12:17:44 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [vowpal_wabbit] save_resume makes results worse (#262)

I changed the semantics of --save_resume so that when used with -t it

resets all accumulators This addresses Martin's unexpected usage.

For Li Pu: what happens if you turn off normalization via --adaptive

--invariant ?

-John

On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:

The bug behavior is that if the model was saved without --save_resume,
all

predictions seem to be normal (none of them close to 50.000000). It was
an

online training setting (single pass). This is the exact command line we

use:

--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off
--keep

c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep
n

--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w
-q

ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:

Reply to this email directly or view it on GitHub<
https://github.com/JohnLangford/vowpal_wabbit/issues/262#issuecomment-39393393>

.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-40536448
.

from vowpal_wabbit.

save_resume makes results worse about vowpal_wabbit HOT 11 CLOSED

Comments (11)

average loss = 0.225275|

train on concatenated training sets

average loss = 0.225275

train separately, first with --save_resume, second without

average loss = 0.225275

train separately, both models with --save_resume

average loss = 0.267824

train separately, without --save_resume

average loss = 0.223781

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent