Coder Social home page Coder Social logo

Assertion failed error about fastfm HOT 15 OPEN

ibayer avatar ibayer commented on May 24, 2024
Assertion failed error

from fastfm.

Comments (15)

sfilatye avatar sfilatye commented on May 24, 2024

Actually, this happens during fit: fm.fit(X_train, y_train)

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

My X_train is "91457x415 sparse matrix of type
type "numpy.float64"
with 8773896 stored elements in Compressed Sparse Row format"
y_train is array([1, 1, 1, ..., 1, 1, 1])

y_train.shape is (91457,)

from fastfm.

ibayer avatar ibayer commented on May 24, 2024

Make sure that none on the columns contains only zero values.
edit: fix typo

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

what is non?

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

I don't see any column consisting just of zeros. If you'd like to replicate the problem, the data set is from
https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/data

The code that gives me the trouble is this:

import pandas as pd
import numpy as np
from fastFM import als
from scipy import sparse

train = pd.read_csv("./input/train.csv")
target = train['target'].values
train=train.drop(['ID','target','v8','v22','v23','v25','v31','v36','v37','v46','v51','v53','v54','v63','v73','v75','v79','v81','v82','v89','v92','v95','v105','v107','v108','v109','v110','v116','v117','v118','v119','v123','v124','v128'],axis=1)
categoricalVariables = []
for var in train.columns:
typ=str(train[var].dtype)
if (typ=='object'): categoricalVariables.append(var)
train=train.drop(categoricalVariables,axis=1)
train=train.fillna(-1)
start=0
fin=100000
train=sparse.csc_matrix(np.array(train.iloc[start:fin,4:7]))

target[target<0.5]=-1
target=target[start:fin]
fm=als.FMClassification()
fm.fit(train, target)

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

When there is 4:7 in train=sparse.csc_matrix(np.array(train.iloc[start:fin,4:7])), there is an error. If I make it 5:7, the error disappears which would point to column 4. However, 4:6 works fine.

from fastfm.

ibayer avatar ibayer commented on May 24, 2024

Do you get always the same error? The assert tells us that a none finite parameter value is calculated for the global offset (zero order) parameter.
https://github.com/ibayer/fastFM-core/blob/2ab4edbd403c4a5a7781cf861e1d8c3b2a87b3c5/src/ffm_als_mcmc.c#L172
This could be due to a all zeros column or something else that's wrong with the input data.
Does it work with the mcmc sampler?

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

It does work with mcmc but doesn't work with als. The error is always the same.

from fastfm.

ibayer avatar ibayer commented on May 24, 2024

Is the dataset very unbalanced (much more -1 then 1)? als.FMClassification uses probit regression which is more sensitive as logistic regress to high class imbalance. Can you construct a minimal example
(very few rows and columns)? You can export it with sklearn.datasets.dump_svmlight_file and post it here.

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

Also, just a comment. When I look at the reference page in the documentation, there is no description of the predict function anywhere (except for fit_predict in the mcmc case). Should it be added for completeness?

from fastfm.

ibayer avatar ibayer commented on May 24, 2024

No, mcmc should not be used with predict. see #40

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

What about als and sgd? The only function that described in the reference is fit. There is no any version of predict at all in the reference.

from fastfm.

ibayer avatar ibayer commented on May 24, 2024

You are right that's indeed a bug in the sphinx files. Can you open a separate issue for that? The predict function is implemented in https://github.com/ibayer/fastFM/blob/master/fastFM/base.py. The derived classes inherit predict but this doesn't show up in the reference.

from fastfm.

sfilatye avatar sfilatye commented on May 24, 2024

Data.txt

Here is a file with the issue. Data set dimensions are (50,20). If you take last 30 rows, it doesn't have an issue but all 50 rows do. The target variable is unbalanced with about 70% of ones.

from fastfm.

zizai110 avatar zizai110 commented on May 24, 2024

I also meet that problem, and there is not all zero column in my data

from fastfm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.