Coder Social home page Coder Social logo

Comments (6)

david-cortes avatar david-cortes commented on May 24, 2024

That error is not coming from this package, but from pandas. Please post the full error log.

from ctpfrec.

JackMack21 avatar JackMack21 commented on May 24, 2024

Thank you for your quick response. This is the full error log:

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py:793: UserWarning: 'words_df' has words that were not present in the training data. These will be ignored.
warnings.warn(msg)
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py:805: UserWarning: 'words_df' contains items that were already present in the training set. These will be ignored.
warnings.warn(msg)

ValueError Traceback (most recent call last)
in ()
----> 1 recommender.add_items(word_counts_test)

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py in add_items(self, words_df, maxiter, stop_thr, ncores, random_seed)
1710 words_df=words_df, maxiter=maxiter, ncores=ncores,
1711 random_seed=random_seed, stop_thr=stop_thr,
-> 1712 return_ix=True, return_temp=False
1713 )
1714

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py in _predict_item_factors(self, words_df, maxiter, ncores, random_seed, stop_thr, return_ix, return_temp)
1505 ncores, maxiter, stop_thr, random_seed = self._process_pars_factors(ncores, maxiter, stop_thr, random_seed, err_subj="item")
1506
-> 1507 words_df, new_item_mapping = self._process_extra_df(words_df, ttl='words_df')
1508 words_df['ItemId'] -= self.nitems
1509 new_max_id = words_df.ItemId.max() + 1

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py in process_extra_df(self, df, ttl, df2)
842 else:
843 new_mapping = np.r
[curr_mapping1, new_ids1]
--> 844 df[col1] = pd.Categorical(df[col1].values, new_mapping).codes
845
846 else:

/home/research/jackmck/.local/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in init(self, values, categories, ordered, dtype, fastpath)
306
307 dtype = CategoricalDtype._from_values_or_dtype(
--> 308 values, categories, ordered, dtype
309 )
310 # At this point, dtype is always a CategoricalDtype, but

/home/research/jackmck/.local/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py in _from_values_or_dtype(cls, values, categories, ordered, dtype)
274 # Note: This could potentially have categories=None and
275 # ordered=None.
--> 276 dtype = CategoricalDtype(categories, ordered)
277
278 return dtype

/home/research/jackmck/.local/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py in init(self, categories, ordered)
161
162 def init(self, categories=None, ordered: Ordered = False):
--> 163 self._finalize(categories, ordered, fastpath=False)
164
165 @classmethod

/home/research/jackmck/.local/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py in _finalize(self, categories, ordered, fastpath)
315
316 if categories is not None:
--> 317 categories = self.validate_categories(categories, fastpath=fastpath)
318
319 self._categories = categories

/home/research/jackmck/.local/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py in validate_categories(categories, fastpath)
491
492 if not categories.is_unique:
--> 493 raise ValueError("Categorical categories must be unique")
494
495 if isinstance(categories, ABCCategoricalIndex):

ValueError: Categorical categories must be unique

from ctpfrec.

david-cortes avatar david-cortes commented on May 24, 2024

Think I've fixed it now. Please try with the latest version from git and see if it still throws an error. You can install it from git with e.g. pip install git+https://www.github.com/david-cortes/ctpfrec.git

from ctpfrec.

JackMack21 avatar JackMack21 commented on May 24, 2024

Hi David,

Unfortunately I am presented with a new error message: OverflowError: Python int too large to convert to C unsigned long

The full log is:

``/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py:793: UserWarning: 'words_df' has words that were not present in the training data. These will be ignored.
warnings.warn(msg)
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py:805: UserWarning: 'words_df' contains items that were already present in the training set. These will be ignored.
warnings.warn(msg)

OverflowError Traceback (most recent call last)
in ()
----> 1 recommender.add_items(word_counts_test)

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py in add_items(self, words_df, maxiter, stop_thr, ncores, random_seed)
1710 words_df=words_df, maxiter=maxiter, ncores=ncores,
1711 random_seed=random_seed, stop_thr=stop_thr,
-> 1712 return_ix=True, return_temp=False
1713 )
1714

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/init.py in _predict_item_factors(self, words_df, maxiter, ncores, random_seed, stop_thr, return_ix, return_temp)
1517 cython_loops.cast_real_t(self.a), cython_loops.cast_real_t(self.b),
1518 cython_loops.cast_real_t(self.c), cython_loops.cast_real_t(self.d),
-> 1519 self.Theta_rte, self.Beta_shp, self.Beta_rte
1520 )
1521

ctpfrec/cy.pxi in ctpfrec.cy_float.calc_item_factors()

OverflowError: Python int too large to convert to C unsigned long``

from ctpfrec.

JackMack21 avatar JackMack21 commented on May 24, 2024

Another point: The error message that states "UserWarning: 'words_df' contains items that were already present in the training set." is incorrect in my example.

I have been studying the Collaborative Topic Poisson Factorization model for a while now and it is fantastic that you have coded the model in Python, there are however a few problems I have encountered over the last week whilst trying to benchmark it against other recommender systems. It would be great if we could chat about this some time if you have some spare time?

from ctpfrec.

david-cortes avatar david-cortes commented on May 24, 2024

Thanks again for reporting the problem. I think I've fixed both issues now. If the problem still persists, would be nice if you could provide an example with random data that triggers the issue - please use three backticks (`) before and after the code or error logs for formatting.

from ctpfrec.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.