Comments (5)
I looked into this. The structure of the code is a triple-nested for
loop with the inner loop producing a feature. In this case, the loop is
u x k x a = 1 x 2 x 1 = 2 additional features. Paul, can you confirm?
-John
On 06/05/2014 02:37 PM, Vaclav Petricek wrote:
The mini example below demonstrates two issues:
- audit outputs just k (=2) latent features although there are in
fact 2 x k (=4) latent features (left and right). Also the features
are all named the same.- The standard progress output then reports just 3 features (probably
does not include the latent features at all while |-q| and matrix
factorizations include the expanded feature set).|[vpetricek@hadoop0000 benchmark]$ echo "|u 1 |a 2" | vw --lrq ua2 --audit
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
creating low rank quadratic features for pairs: ua2
using no cache
Reading datafile =
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000
u^1:60292:1:0@0 a^2:108240:1:0@0 lrq^u^a:108241:0:0@0 lrq^u^a:108242:0:0@0 Constant:202096:1:0@0
0.000000 0.000000 1 1.0 unknown 0.0000 3finished run
number of examples per pass = 1
passes used = 1
weighted example sum = 1
weighted label sum = 0
average loss = 0
best constant = nan
total feature number = 3
|—
Reply to this email directly or view it on GitHub
#319.
from vowpal_wabbit.
I confessed to being confused by my own code :)
When testing or when there is no label, there are only k features sent into the core. When training there are 2*k features sent into the core, basically cheesing the gradient via d(xy)=x dy + y dx. So unfortunately for audit, different outputs will be obtained in these two cases. Perhaps audit should be modified to only output things on the first iteration, which would make the output the same under these various conditions.
I made other screw ups wrt audit piece that Alexey Rodriguez fixed in pull request 324 (#324) which was just merged, perhaps you should check that out, maybe you'll like the results better. He tested the audit output for sanity but might have used -t or unlabeled data when validating the audit output in which case this problem would not have been detected.
from vowpal_wabbit.
Regarding (1), as Paul says, on testing indeed only two features will be shown by audit, however if you add a label you will see all of them:
$ echo "1 |u 1 |a 2" | ./vw --lrq ua2 --audit
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
creating low rank quadratic features for pairs: ua2
using no cache
Reading datafile =
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000
u^1:60292:1:0@0 a^2:108240:1:0@0 lrq^a^2^1:108241:0.475038:0@0 lrq^a^2^2:108242:0.0818419:0@0 Constant:202096:1:0@0
0.673033
lrq^u^1^2:60294:1.64519:0.0818419@0 lrq^u^1^1:60293:0.283379:0.475038@0 u^1:60292:1:0.13459@4 a^2:108240:1:0.13459@4 Constant:202096:1:0.13459@4
1.000000 1.000000 1 1.0 1.0000 0.0000 3
finished run
number of examples per pass = 1
passes used = 1
weighted example sum = 1
weighted label sum = 1
average loss = 1
best constant = 1
total feature number = 3
Note that additional numerical suffix to index the latent factors (already merged in master).
The difference between testing and training audits is indeed confusing. Also the training audit is hard to understand initially (there are two audit lines per example), but they can be interpreted by knowing how LRQ works.
So, in short, the audit output makes sense when you understand how LRQ works. However, this might be confusing for users. Do you have an opinion on what should be the desired audit output for lrq?
Regarding (2), lrq.cc indeed does not update the feature count, so that should be fixed (I'll try to look into it). I suppose it should add 2 x k latent features?
I noticed that I introduced a bug, if --lrq, --audit and cache files are enabled at the same time, the audit code will crash. I'll try to fix it this week.
from vowpal_wabbit.
Here's the thing ... there really are only k features. When there is a label present and training is enabled, each feature is sent into the core twice in order to approximate the gradient, but that's an implementation detail.
Ergo, audit related output should only show 1 copy of the features. testing and no label behaviour is correct but training behavior is not. The fix would be something like changing line 136 from lrq.cc from
if (all.audit || all.hash_inv)
to
if (iter == 0 && (all.audit || all.hash_inv))
from vowpal_wabbit.
Tweaked code as suggested.
from vowpal_wabbit.
Related Issues (20)
- First regression test
- Default FTRL hyperparameters are very conservative HOT 4
- Sklearn adapter function `tovw` does not support unsigned integers in features HOT 3
- vw core CLI: --nn data-leak: SGD update is incorrect HOT 5
- Vowpal_wabbit failed to runtest on MSVC on Windows HOT 2
- Coin/FTRL/Pistol are not available via vw.get_config() HOT 1
- Detailed explanation on --explore_eval option for contextual bandits HOT 2
- Classification Multivariate time series : values both categorical and continues HOT 5
- Segmentation fault in CATS HOT 2
- Contextual Bandit vowpal_wabbit training dataset validation HOT 2
- New line is misinterpreted as example
- Binary File Inputs HOT 3
- Your domain is only being misused for illegal gambling promotions in Indonesia HOT 2
- --interact is not working HOT 3
- Slates Json parser error
- Option to enable SSE2 optimization HOT 2
- Segfault on ccb_explore_adf -cb_type dr HOT 7
- Support for multi-line featuers in --audit_regressor HOT 1
- New installation unable to find boost python lib LINK : fatal error LNK1104: cannot open file 'boost_python312-vc143-mt-x64-1_84.lib' HOT 2
- Sporadic failure in read_span_flatbuffer tests
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vowpal_wabbit.