I have just two categories: "domain" and "keyword" and when I pass "-q dk" on the comm

Is there a reason to allow cache and invert_hash? <p di

segfault on "-q" about vowpal_wabbit HOT 9 CLOSED

sam-s commented on May 24, 2024

segfault on "-q"

from vowpal_wabbit.

Comments (9)

sam-s commented on May 24, 2024

when built with make CFLAGS='-g -O0', the backtrace is

Program received signal SIGSEGV, Segmentation fault.
0x000000000042c748 in GD::audit_quad (all=..., left_feature=<value optimized out>, left_audit=0x0, right_features=..., audit_right=..., results=std::vector of length 4, capacity 4 = {...}, ns_pre=
    "", offset=<value optimized out>) at gd.cc:235
235       audit_features(all, right_features, audit_right, results, prepend, ns_pre, halfhash + offset, left_audit->x);
Missing separate debuginfos, use: debuginfo-install boost-program-options-1.41.0-18.el6.x86_64 glibc-2.12-1.132.el6.x86_64 libgcc-4.4.7-4.el6.x86_64 libstdc++-4.4.7-4.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) where
#0  0x000000000042c748 in GD::audit_quad (all=..., left_feature=<value optimized out>, left_audit=0x0, right_features=..., audit_right=..., results=std::vector of length 4, capacity 4 = {...},
    ns_pre="", offset=<value optimized out>) at gd.cc:235
#1  0x000000000042e060 in GD::print_features (all=..., ec=...) at gd.cc:295
#2  0x000000000043f6cc in learn<true, true, true, 2, 0> (d=0x704330, base=..., ec=...) at gd.cc:620
#3  LEARNER::tlearn<GD::gd, &(GD::learn)> (d=0x704330, base=..., ec=...) at ./learner.h:67
#4  0x0000000000447c82 in learn (all=0x6ca4c0) at ./learner.h:109
#5  LEARNER::generic_driver (all=0x6ca4c0) at learner.cc:20
#6  0x0000000000409c68 in main (argc=<value optimized out>, argv=<value optimized out>) at main.cc:46

from vowpal_wabbit.

arielf commented on May 24, 2024

Hi Sam,

Thanks for the report. Unfortunately, I can't reproduce with latest source from github.

Can you provide a full reproducible example (full command line, and some hopefully small sample of a data-set) please? In order to help trim the data size significantly, you may try and run with --progress 1 to find the specific example where the crash occurs.

Thanks again

from vowpal_wabbit.

sam-s commented on May 24, 2024

vw --passes 100 --invert_hash 6289-hinge-100-qdk.tx^C--loss_function hinge --cache_file cache -q dk

I have a 5mb data file; github rejected e-mail with it as too large

from vowpal_wabbit.

arielf commented on May 24, 2024

Ok, thanks. I managed to reproduce this with a small data-set. There's no need to do 100 passes, 2 suffice, there's no need to use --loss_function hinge either. We need a combo of 3 conditions to reproduce:

Using a cache
using -q
using --invert_hash

It SEGVs after the 1st pass is complete and it tries to write the cache.

My command line is:

vw -k -c --passes 2 --invert_hash zz.ih -q dk Regressions/q-segfault.dat

Regressions/q-segfault.dat is just 2 lines:

1 |domain x.com |keyword a b c
-1 |domain y.com |keyword d e f

from vowpal_wabbit.

JohnLangford commented on May 24, 2024

Is there a reason to allow cache and invert_hash? It seems like a bad
idea. I tweaked the code to not allow this.

-John

On 05/15/2014 01:02 AM, Ariel Faigon wrote:

Ok, thanks. I managed to reproduce this with a small data-set. There's
no need to do 100 passes, 2 suffice, there's no need to use
--loss_function hinge either. We need a combo of 3 conditions to
reproduce:

Using a cache

using -q

using --invert_hash

It SEGVs after the 1st pass is complete and it tries to write the cache.

My command line is:

|vw -k -c --passes 2 --invert_hash zz.ih -q dk Regressions/q-segfault.dat|

|Regressions/q-segfault.dat| is just 2 lines:

|1 |domain x.com |keyword a b c
-1 |domain y.com |keyword d e f
|

—
Reply to this email directly or view it on GitHub
#299 (comment).

from vowpal_wabbit.

sam-s commented on May 24, 2024

Is there a reason to allow cache and invert_hash?

I need cache to have more than 1 pass and invert_hash to produce a human-readable model.

from vowpal_wabbit.

JohnLangford commented on May 24, 2024

invert_hash has a severe performance impact. You should only use it
sparingly. Why not instead save a learned regressor and then use
invert_hash in a single pass over the data to get a readable model?

-John

On 05/17/2014 09:27 PM, Sam Steingold wrote:

    Is there a reason to allow cache and invert_hash?
    I need cache to have more than 1 pass and invert_hash to
    produce a human-readable model.

—
Reply to this email directly or view it on GitHub
#299 (comment).

from vowpal_wabbit.

sam-s commented on May 24, 2024

The single pass you are suggesting will change the regressor, so the resulting readable model will not be identical to the learned one.
Also, this one extra pass may be quite expensive - what if I trained the model on hadoop?

from vowpal_wabbit.

JohnLangford commented on May 24, 2024

Use '-t' to turn of training.

This one extra pass will be radically less expensive than keeping
invert_hash on for all the passes.

-John

On 05/17/2014 09:34 PM, Sam Steingold wrote:

The single pass you are suggesting will change the regressor, so the
resulting readable model will not be identical to the learned one.
Also, this one extra pass may be quite expensive - what if I trained
the model on hadoop?

—
Reply to this email directly or view it on GitHub
#299 (comment).

from vowpal_wabbit.

segfault on "-q" about vowpal_wabbit HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent