Coder Social home page Coder Social logo

stocktwits.recommender's Introduction

rec-sys

Recommendation system using deep learning

stocktwits.recommender's People

Contributors

pugantsov avatar

stocktwits.recommender's Issues

Implicit Model Results: TypeError: Object of type 'ndarray' is not JSON serializable

Traceback (most recent call last):
  File "model_spotlight.py", line 264, in <module>
    sim.run(default=True)
  File "model_spotlight.py", line 257, in run
    results.save(DEFAULT_PARAMS, evaluation)
  File "model_spotlight.py", line 37, in save
    'hyperparameters': self._hash(hyperparameters),
  File "model_spotlight.py", line 32, in <lambda>
    self._hash = lambda x : hashlib.md5(json.dumps(x, sort_keys=True).encode('utf-8')).hexdigest()
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'ndarray' is not JSON serializable

S_POOL: ValueError: zero-size array to reduction operation maximum which has no identity

Traceback (most recent call last):
  File "model_spotlight.py", line 536, in <module>
    sim.run(defaults=True)
  File "model_spotlight.py", line 528, in run
    evaluation = self.evaluation(model, (train, test))
  File "model_spotlight.py", line 439, in evaluation
    train_prec, train_rec = sequence_precision_recall_score(model, train)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/evaluation.py", line 128, in sequence_precision_recall_score
    predictions = -model.predict(sequences[i])
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/sequence/implicit.py", line 318, in predict
    self._check_input(sequences)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/sequence/implicit.py", line 187, in _check_input
    item_id_max = item_ids.max()
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/numpy/core/_methods.py", line 28, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial)
ValueError: zero-size array to reduction operation maximum which has no identity

iterate_location_data not updating parsed locations to DataFrame

2019-02-28 00:28:16,844 [MainThread  ] [INFO ]  Removed users with less than 160 tweets. Size of DataFrame: 304206 -> 168873
2019-02-28 00:28:17,060 [MainThread  ] [INFO ]  Beginning NER parsing...
2019-02-28 00:39:00,502 [MainThread  ] [INFO ]  Parsing complete, recompiling DataFrame...
2019-02-28 00:39:03,815 [MainThread  ] [INFO ]  Removed users with malformed location information. Size of DataFrame: 167151 -> 0
2019-02-28 00:39:04,862 [MainThread  ] [INFO ]  Written CSV at 2019-02-28 00:39:04 with 0 entries

iterate_location_data not writing properly to DataFrame, wipes all entries. Find out how to debug whilst in Pool.

SpotlightModel: Training Error (Tensor Size)

Traceback (most recent call last):
  File "model_spotlight.py", line 54, in <module>
    sim.run()
  File "model_spotlight.py", line 50, in run
    implicit_model.predict(user_ids, item_ids=None)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/factorization/implicit.py", line 307, in predict
    self._use_cuda)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/spotlight/factorization/_components.py", line 20, in _predict_process_ids
    user_ids = user_ids.expand(item_ids.size())
RuntimeError: The expanded size of the tensor (11439) must match the existing size (31548) at non-singleton dimension 0

Normalisation for interactions

Find out if it's worth normalising the interaction numbers and consider some form of bot detection to get rid of outliers manipulating the recommendations.

Update deprecated set_value (pandas) in iterate_location_data

parse.py:160: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  d.set_value(i, 'user_loc_check', True)

Try df.at[df.index[2], 'ColName'] = 3 format instead of d.set_value(i,'user_location','|'.join(location))

Add timestamp to item features

Use arrow package to parse timestamps and add it as an item feature. Might be worth adding in AttributeCleaner so as to separate parsing from model.

Frame SAR interactions for Cashtag data

SAR is intended to be used on interactions with the following schema: , ,,[], [].

Find out what exactly constitutes a type, is it datatype or a classification (if latter, use sectors)

Fix bad parsing for user features in HybridBaselineModel

2019-02-28 15:05:37,168 [MainThread  ] [INFO ]  The dataset has 176 users and 92363 items with 18473 interactions in the test and 73890 interactions in the training set.
2019-02-28 15:05:37,169 [MainThread  ] [INFO ]  Begin fitting collaborative filtering model...
2019-02-28 15:05:39,443 [MainThread  ] [INFO ]  Collaborative Filtering training set AUC: 0.95837283
2019-02-28 15:05:40,755 [MainThread  ] [INFO ]  Collaborative Filtering test set AUC: 0.28463602
2019-02-28 15:05:40,755 [MainThread  ] [INFO ]  There are 92 distinct user locations, 9 distinct sectors, 215 distinct industries and 3929 distinct cashtags.
2019-02-28 15:05:40,825 [MainThread  ] [INFO ]  Begin fitting hybrid model...
2019-02-28 15:05:44,184 [MainThread  ] [INFO ]  Hybrid training set AUC: 0.889686
2019-02-28 15:05:45,276 [MainThread  ] [INFO ]  Hybrid test set AUC: 0.8127067
2019-02-28 15:05:48,247 [MainThread  ] [INFO ]  Hybrid training set Precision@10: 0.26931816
2019-02-28 15:05:49,271 [MainThread  ] [INFO ]  Hybrid test set Precision@10: 0.0
2019-02-28 15:05:52,234 [MainThread  ] [INFO ]  Hybrid training set Recall@10: 0.01004312342323803
2019-02-28 15:05:53,267 [MainThread  ] [INFO ]  Hybrid test set Recall@10: 0.0
model_baseline_hybrid.py:371: RuntimeWarning: invalid value encountered in double_scalars
  f1_train, f1_test = 2*(train_recall * train_precision) / (train_recall + train_precision), 2*(test_recall * test_precision) / (test_recall + test_precision)
2019-02-28 15:05:53,268 [MainThread  ] [INFO ]  Hybrid training set F1 Score: 0.01936414014913654
2019-02-28 15:05:53,268 [MainThread  ] [INFO ]  Hybrid test set F1 Score: nan
2019-02-28 15:05:56,225 [MainThread  ] [INFO ]  Hybrid training set MRR: 0.3619097
2019-02-28 15:05:57,254 [MainThread  ] [INFO ]  Hybrid test set MRR: 0.0029159905

Precision/Recall seems to break model when 0.0 is passed, also check why 0.0 is passing in the first place. The fact that there's no information at all for the test set is unusual (even if very low results).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.