zenogantner / mymedialite Goto Github PK

View Code? Open in Web Editor NEW

501.0 50.0 190.0 29.94 MB

recommender system library for the CLR (.NET)

Home Page: http://mymedialite.net

Shell 3.01% Makefile 0.62% C# 89.96% Perl 6.41%

collaborative-filtering evaluation item-prediction matrix-factorization rating-prediction recommender-systems

mymedialite's People

Contributors

Stargazers

Watchers

Forkers

dylanhogg tarunanand marrk edwardt kinyue shroom spirit-dongdong joaoms pipifuyj irwenqiang mmanzato mariankh dxgod cchenv musashi974 stusutherland isabellali proximamonkey inman wxiang7 liyanghua yanacov ummae bigbear2017 gevourah sharop wl-pro meng-li avontd2868 invinciblejha burjorjee amedhat3 visionwang jesusgarza ejjy mahadevanalagar yingzheng lhcgreg tmacmilan tristian2 ywl sanqiang njuhugn gxhrid manleviet deepthipr ichengzi jtoelke roant way2joy sijmen stryker1 ziwei-fan prakashru ljhaaa lucentcosmos mbit-cloud chuchu2op seedaily jcastro-inf 466152112 ty01csbaidu narayana1208 milstein guomin phsimon vulcanallen delip xulunfan nkwangyuan it8090 babakx micseb lexiao811 ysongfinance ambier ericeiffel ericzhouh nedosekov akiratu devil399 gjcoding cherishzhang sixence altaibaatar newbeess andrewsamodurov paraschakis wisonhuang jooliver mokarakaya strategist922 penkoske dominikimrich softwarevamp tpnguyen mohit-shrma trietnm2 chenzhen fangzheng354

mymedialite's Issues

Example web application

Implement example web application that uses the web service interface.

command-line programs: load from model file without specifying recommender type

The recommender type can also be derived from the information in the model file.

implement Collaborative Topic Models by Wang+Blei

http://www.cs.princeton.edu/~chongw/

create stand-alone binary package of the GUI demo

filters for item prediction

add pre- and post-filter APIs to MyMediaLite

pre-filters generate candidate lists

categories
already seen
...

post-filter

diversification
thresholds

use uint instead of int to refer to entities and list entries

user and item IDs could be uints (it is assumed anyway that they are >= 0)

Same for index data types in many places in the library.

It would make it harder to port MyMediaLite to Java after those changes, so we better be careful.

Not high priority.

integrate GraphLab

GraphLab has a nice library of rating prediction algorithms based on matrix/tensor factorization:
http://graphlab.org/pmf.html

It would be nice to have an interface to GraphLab to be able to use this library and to use other recommenders written "in" GraphLab that make use of its particular features wrt. parallelization.

chronological splits

Support chronological splits, both relative to user history and to absolute times.

--chronological-split=DATETIME

--chronological-split=RATIO

fully support KDD Cup 2011 data format in command-line programs

Support track 1+2 in rating prediction program, and track 2 in item prediction program.

support ensembles in the command-line programs

Support the combination of several recommenders by the command-line programs.

every recommender should have at least one literature reference in the API documentation

... so that people know where to read about the implemented method.

recommender: factorized personal markov chains

Paper: http://www.ismll.uni-hildesheim.de/pub/pdfs/RendleFreudenthaler2010-FPMC.pdf

Can be used for next-basket recommendations (=recommendations based on the last purchase)

For this, create ISequentialItemRecommender.

output eval graphs for item prediction

Output graphs (image files or CSV files) for things like precision@N and recall@N for different N.

modularize .DLL files

MyMediaLite.dll : core library without external dependencies
MyMediaLite.SVM.dll : recommenders that need LIBSVM
MyMediaLite.Math.NET.dll : recommenders that need Math.NET

MyMediaLiteExperimental.dll : experimental code
MyMediaLiteExperimental.SVM.dll
MyMediaLiteExperimental.Math.NET.dll

cross-validation for relation-aware recommenders

two modes: split relations, do not split relations

top-n evaluation for rating prediction

support top-n evaluation (and other item prediction measures) in the rating prediction command-line program

implement/port CofiRank by Weimer et al.

http://www.cofirank.org/

save user/item ID mappings together with recommender models

Chris wrote:

I've done some work with Mahout, and one feature I appreciate is that it stores the user and item mappings with the model data when you save it.
It makes it easier to resurrect a recommender and reduces the likelihood I'll get all the IDs mixed up!

hyperparameter search for all recommenders

Hyperparameter search by line/grid search and Nelder-Mead should be supported for all recommenders;
For recommenders that use a learn rate (=step size), there should also be routines for learning good step sizes.

This will push MyMediaLite more towards being usable as a black-box tool.

active learning interface

Create an interface for active learning recommenders, i.e. recommenders that request certain items to be rated by a user in order to improve the predictive model.

support more relation types

Currently we have binary relations over users or items.

In the future, we additionally may want to have

ternary relations (tags)
n-ary relations
weighted relations
multiple relations over the same set (equivalent to labelled relations)
relations with time information
etc.

sample training data

Only read in a certain percentage of the training data:

--sample-ratings=RATIO
--sample-users=RATIO
--sample-items=RATIO

stand-alone evaluation exectuables

for rating prediction and item prediction

The idea is that users of other software packages can use those to create the predictions, and then evaluate the predictions using MyMediaLite's evaluation routines.

Suggested by Lucas Drumond.

kNN recommenders: use UserItemBaseline via composition, not inheritance

http://www.ismll.de/mymedialite/documentation/doxygen/interface_my_media_lite_1_1_i_iterative_model.html

The current solution is not the most elegant.
KNN recommenders are (usually) not iterative models, so we should rather use the UserItemBaseline via composition, not inheritance.

update from Math.NET Iridium to Math.NET Numerics

update the math package
consider using Math.NET numerics for matrix and vector computations - if it is faster than our home-grown code (likely)

evaluation: new-user/new-item cross-validation

Support CV for cold-start evaluation protocols

load recommender from model file w/o specifying the type

Currently, we instantiate a recommender and then load a model via its LoadModel() method.

It would be nice to have a tiny helper tool that looks into the model file, instantiates the recommender by itself, and then does the above.

documentation: F#

Create an example that explains how to use MyMediaLite from F#, and how to implement a new recommender in F#.

implement Bayesian Probabilistic Matrix Factorization

http://www.mit.edu/~rsalakhu/papers/bpmf.pdf

support KDD Cup 2011 track 2 evaluation protocol (item prediction)

For each positive item, sample a negative item according to its overall frequency/popularity.

documentation: DB howto

Give an example how to get training data from a database.

automatic determination of a suitable learn rate

For recommenders that are trained with gradient-based algorithms we need suitable learn rates. These usually differ from data set to data set. MyMediaLite should contain a routine that automatically finds a suitable learn rate for a given data set.

create Debian package

Context-aware recommendation

add namespace ContextAwareRecommendation with the interfaces IContextAwareItemRecommender (also covers tag recommendation, time-aware recommendations, and search queries) and IContextAwareRatingRecommender

integrate Sequin library for sequence mining

http://sequin.codeplex.com/

directly support MovieLens u.item and u.user files

... for reading in attributes

stand-alone rating prediction exectuable

A rating prediction program that does not need training data, but just relies on the model file to make predictions.

Will not work for memory-based recommenders; we will also take care to change the model file format to incorporate user ID and item ID mappings.

group recommendation interface

Create an interface for recommenders that aggregate score predictions for several users.

modularize: move apps in different repositories

The item and rating command-line programs should remain in the core repository, but the attribute-to-factor mapping code and the GUI demo could go into another repository.

get MinRating and MaxRating from data

Currently, the user (of the command-line tool or the library) has to set the minimum and maximum ratings manually (if they are not the default 1 and 5). It would be more convenient to get them from the data and allowing to set them manually if necessary.

support non-binary attributes out of the box

Currently, attributes are supposed to be binary: https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/IO/AttributeData.cs

It would be nice if the recommender API supported at least binary and real-valued attributes, and the IO methods supported binary, real-valued, nominal, and text attributes, and would map them accordingly to binary and real-valued attributes.

suppress i18n for command line parameters

Currently, parsing floats/doubles in the Mono.Option command line parameters follows the current locale.
This is not desirable, because we want the command line options to be the same everywhere so that people can copy+paste commands from the documentation etc.

make bold-driver heuristics configurable

Currently, the bold-driver learning rate adaptation schemes (for BiasedMatrixFactorization and BPRMF) use fixed values to increment/decrement the step size. This should be configurable (and set to sensible defaults)

rating_prediction and rating_based_ranking programs should also support --test-users and --candidate-items arguments

For consistency with the item prediction program, and because it would be a useful feature.
That way, we could also generate rating predictions for arbitrary items.

Parallelize item prediction evaluation

... by parallelizing the candidate score computations.

command line arguments to select underlying data types

--static: slower loading (2 passes), less memory consumption, faster access, no new data can be added
--non-static: faster loading (1 pass), new data can be added

create NuGet package

implement user fold-in

Chris suggested this for item prediction.

An interface for this could be:

IList<WeightedItems> Predict(IList<int> watched_items, IList<int> candidate_items)
IList<int> Predict(IList<int> watched_items, IList<int> candidate_items, int n)

This would train features for a user specified by the list watched_items, and then predict scores for the list candidate_items.

One additional thing to consider for the interface would be to extend the interface to allow user attributes (not supported by BPR-MF, but possibly by other recommenders):

IList<double> Predict(IList<int> watched_items, var user_features , IList<int> candidate_items)

Also implement a similar thing for rating prediction, like

 IList<WeightedItem> PredictItems(IList<WeightedItem> rated_items);
 IList<int> PredictItems(IList<WeightedItem> rated_items, int n);

common class for command-line programs

The current command-line programs for item and rating prediction share many concepts.
It may be worthwile to consider implementing those shared concepts in one class and deriving from it.