Comments (11)
This is essentially an unexpected behavior which I'm not sure how to
deal with.
Consider this sequence of commands:
cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model --save_resume -f
final.model
echo "" | vw --loss_function=logistic -i final.model -f finaler.model
cat set02 | vw --loss_function=logistic -i finaler.model -t|
average loss = 0.225275|
So, save_resume isn't doing anything bad to the state. Instead, it's
the accumulator which differs. A basic question when you run '-t' is:
do you want the average loss over just the test examples or the average
over the whole sequence? You are expecting the first, but it's
reporting the second.
What is the correct behavior?
-John
On 03/26/2014 08:49 AM, Martin Popel wrote:
When you want to train N subsequent models on N data sets, you must
use --save_resume flag in the first N-1 trainings, but you SHOULD NOT
use it in the last (N-th) training, if you want to get the same
results as when training on all the N data sets concatenated. John
Langford confirmed that "this looks bugly"
https://groups.yahoo.com/neo/groups/vowpal_wabbit/conversations/topics/3329.I attach an example with N=2.
Not using --save_resume makes the final test loss (0.223781) only a
bit worse than the baseline (0.225275).
However, using --save_resume in both trainings makes the final test
loss much worse (0.267824).|### prepare data train=set00,set01 test=set02
cd vowpal_wabbit/test/train-sets
split -dl 4000 rcv1_small.dat settrain on concatenated training sets
cat set00 set01 | vw --loss_function=logistic -f final.model
cat set02 | vw --loss_function=logistic -i final.model -taverage loss = 0.225275
train separately, first with --save_resume, second without
cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model -f final.model
cat set02 | vw --loss_function=logistic -i final.model -taverage loss = 0.225275
train separately, both models with --save_resume
cat set00 | vw --loss_function=logistic -f A.model --save_resume
cat set01 | vw --loss_function=logistic -i A.model --save_resume -f final.model
cat set02 | vw --loss_function=logistic -i final.model -taverage loss = 0.267824
train separately, without --save_resume
cat set00 | vw --loss_function=logistic -f A.model
cat set01 | vw --loss_function=logistic -i A.model -f final.model
cat set02 | vw --loss_function=logistic -i final.model -taverage loss = 0.223781
|
|—
Reply to this email directly or view it on GitHub
#262.
from vowpal_wabbit.
We observed similar behavior recently. But it only happens when zero-weight examples got involved:
- train a vw model with --save_resume, and a dataset which contains some examples of zero weight. (either train the model in daemon mode or non-daemon mode)
- start vw with initial model (-i) obtained from step 1 and -t (--test_only), test some examples
- in the prediction of test examples, some prediction have value 50.000000, which is the upper bound set by vw internally for logistic_loss
note: this bug would not be triggered if --save_resume was not used in step 1, or --test_only was not used in step 2, or there were no examples of zero weights in step 1.
from vowpal_wabbit.
Is your train set the same as your test set here? Or different? If (b)
can you try (a)?
-John
On 04/01/2014 08:04 PM, vrilleup wrote:
We observed similar behavior recently. But it only happens when
zero-weight examples got involved:
- train a vw model with --save_resume, and a dataset which contains
some examples of zero weight. (either train the model in daemon
mode or non-daemon mode)- start vw with initial model (-i) obtained from step 1 and -t
(--test_only), test some examples- in the prediction of test examples, some prediction have value
50.000000, which is the upper bound set by vw internally for
logistic_lossnote: this bug would not be triggered if --save_resume was not used in
step 1, or --test_only was not used in step 2, or there were no
examples of zero weights in step 1.—
Reply to this email directly or view it on GitHub
#262 (comment).
from vowpal_wabbit.
I just tried both cases:
(a) same train and test set. Only the zero weight examples got prediction 50.000000, but not all zero weight eamples got 50.000000 (about 40% of zero weight examples, which depends on the feature distribution I guess).
(b) different train and test set. Prediction 50.000000 can be seen for both zero weight and nonzero weight examples (about 15% of all examples).
from vowpal_wabbit.
This is not obviously a bug. Particularly when using LBFGS and/or
multi-pass learning, the predictor can become extremely certain about some
predictions. This is why those thresholds are in there.
-John
On Wed, Apr 2, 2014 at 1:47 PM, Li Pu [email protected] wrote:
I just tried both cases:
(a) same train and test set. Only the zero weight examples got prediction
50.000000, but not all zero weight eamples got 50.000000 (about 40% of zero
weight examples, which depends on the feature distribution I guess).
(b) different train and test set. Prediction 50.000000 can be seen for
both zero weight and nonzero weight examples (about 15% of all examples).Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39361078
.
from vowpal_wabbit.
echo "" | vw --loss_function=logistic -i final.model -f finaler.model
OK, this is a clever workaround for this issue (but annoying with larger models).
A basic question when you run '-t' is:
do you want the average loss over just the test examples or the average
over the whole sequence?
Yes, I am expecting the first when using -t.
I think the second is counter-intuitive here.
I think save_resume's primary goal is to produce a model which behaves exactly the same as if trained in one step on all training sets.
(Of course, it must contain some extra info to allow one more training step.)
What is the correct behavior?
First, what are the use cases for save_resume?
A) training in more steps
B) testing in more steps
I've always used only A, but maybe someone needs B as well.
I don't care what loss is reported in the steps which use --save_resume and don't use -t (i.e. in the training steps).
Probably it should be the average loss over all examples in all training steps.
I don't care what loss is reported in the steps which use --save_resume and -t (i.e. use case B).
Probably it should be the average loss over all test steps.
However, I suggest to change the behavior when -t is used and --save_resume is not used.
In this case, only the loss of the current step should be reported, I think.
from vowpal_wabbit.
The bug behavior is that if the model was saved without --save_resume, all predictions seem to be normal (none of them close to 50.000000). It was an online training setting (single pass). This is the exact command line we use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n --keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:
from vowpal_wabbit.
I changed the semantics of --save_resume so that when used with -t it
resets all accumulators This addresses Martin's unexpected usage.
For Li Pu: what happens if you turn off normalization via --adaptive
--invariant ?
-John
On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:
The bug behavior is that if the model was saved without --save_resume, all
predictions seem to be normal (none of them close to 50.000000). It was an
online training setting (single pass). This is the exact command line we
use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep
c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n
--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q
ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39393393
.
from vowpal_wabbit.
Hi John,
Thank you very much for your reply! I tried turning off normalization via --adaptive --invariant, and via --sgd. But still there are 50.000000 predictions in the result. What would be possible cause of this? I suspect there are some new features in the -t dataset, and these new features were not present in the --save_resume model.
Best,
Li
Date: Mon, 14 Apr 2014 12:17:44 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [vowpal_wabbit] save_resume makes results worse (#262)
I changed the semantics of --save_resume so that when used with -t it
resets all accumulators This addresses Martin's unexpected usage.
For Li Pu: what happens if you turn off normalization via --adaptive
--invariant ?
-John
On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:
The bug behavior is that if the model was saved without --save_resume, all
predictions seem to be normal (none of them close to 50.000000). It was an
online training setting (single pass). This is the exact command line we
use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off --keep
c --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep n
--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w -q
ev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:
Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-39393393
.
—
Reply to this email directly or view it on GitHub.
from vowpal_wabbit.
A prediction of 50 means that vw is very very certain that it is a positive label.
A prediction of -50 means that vw is very very certain that it is a negative label.
You may pipe these predictions into utl/logistic to map them to [-1, 1]
range.
You may also use --max_prediction
and --min_prediction
for clipping but this may be inappropriate for your needs (may lose significant accuracy from range clipping).
from vowpal_wabbit.
A nonstationarity between train and test set could account for the
different behavior.
-John
On Tue, Apr 15, 2014 at 5:24 PM, Li Pu [email protected] wrote:
Hi John,
Thank you very much for your reply! I tried turning off normalization via
--adaptive --invariant, and via --sgd. But still there are 50.000000
predictions in the result. What would be possible cause of this? I suspect
there are some new features in the -t dataset, and these new features were
not present in the --save_resume model.
Best,
LiDate: Mon, 14 Apr 2014 12:17:44 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [vowpal_wabbit] save_resume makes results worse (#262)I changed the semantics of --save_resume so that when used with -t it
resets all accumulators This addresses Martin's unexpected usage.
For Li Pu: what happens if you turn off normalization via --adaptive
--invariant ?
-John
On Wed, Apr 2, 2014 at 6:43 PM, Li Pu [email protected] wrote:
The bug behavior is that if the model was saved without --save_resume,
allpredictions seem to be normal (none of them close to 50.000000). It was
anonline training setting (single pass). This is the exact command line we
use:
--loss_function logistic -l 0.5 --initial_t 1e6 -b 27 --holdout_off
--keepc --keep d --keep e --keep f --keep j --keep k --keep l --keep m --keep
n--keep o --keep p --keep r --keep s --keep t --keep u --keep v --keep w
-qev -q ew -q fj -q fk -q fl -q fm -q fn -q fo -q fp -q st -q r:
Reply to this email directly or view it on GitHub<
https://github.com/JohnLangford/vowpal_wabbit/issues/262#issuecomment-39393393>.
Reply to this email directly or view it on GitHub.
Reply to this email directly or view it on GitHubhttps://github.com//issues/262#issuecomment-40536448
.
from vowpal_wabbit.
Related Issues (20)
- Default FTRL hyperparameters are very conservative HOT 4
- Sklearn adapter function `tovw` does not support unsigned integers in features HOT 3
- vw core CLI: --nn data-leak: SGD update is incorrect HOT 5
- Vowpal_wabbit failed to runtest on MSVC on Windows HOT 2
- Coin/FTRL/Pistol are not available via vw.get_config() HOT 1
- Detailed explanation on --explore_eval option for contextual bandits HOT 2
- Classification Multivariate time series : values both categorical and continues HOT 5
- Segmentation fault in CATS HOT 2
- Contextual Bandit vowpal_wabbit training dataset validation HOT 2
- New line is misinterpreted as example
- Binary File Inputs HOT 3
- Your domain is only being misused for illegal gambling promotions in Indonesia HOT 2
- --interact is not working HOT 3
- Slates Json parser error
- Option to enable SSE2 optimization HOT 2
- Segfault on ccb_explore_adf -cb_type dr HOT 7
- Support for multi-line featuers in --audit_regressor HOT 1
- New installation unable to find boost python lib LINK : fatal error LNK1104: cannot open file 'boost_python312-vc143-mt-x64-1_84.lib' HOT 2
- Sporadic failure in read_span_flatbuffer tests
- Request to daemon hands HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vowpal_wabbit.