Coder Social home page Coder Social logo

newsclustering's Introduction

News Clustering via GibsLDA++

NewsClustering is a final project developed by a group of three students (Ilaria Ceppa, Marco Grandi and Marco Ponza) for the Information Retrieval course.

The goal of the project was to develop, experiment and analyze results of a clustering software which uses GibsLDA++ to generate clusters of italian news articles.

The final report is available in the current repository (italian only).

Setting up

The project can be compiled by typing:

make clean
make all

and the helper can be displayed with:

./clusteringLDA --help

Cluster Generation

To run the application on a news dataset type:

./clusteringLDA [-v] [-a alpha] [-b beta] [-n clusters] [-t terms] [-m size] [-i iter] [-s step] [-o file] [-c clust] [-d string] dataset_file

where:

  • -v shows the parameter values before running the application;
  • -a alpha set the alpha parameter of GibsLDA++;
  • -b beta set the beta parameter of GibsLDA++;
  • -n clusters set the number of clusters you want to generate;
  • -t terms set the number of terms that will be showed to the output file;
  • -m size minimum cluster size (clusters with a lower size will be removed);
  • -i iter set the number of iterations of GibsLDA++;
  • -s step set the number of iterations after which a temporary model will be generated;
  • -o file set the output file;
  • -c clust model name generated by GibsLDA++;
  • -d string set the preprocessing algorithms to NOT use:
  • . disables the punctuation filter;
  • s disables stopwords;
  • w disables shingling;
  • i disable the idf filter;
  • m disables cluster-size thresholding;
  • p disables document filter.

newsclustering's People

Contributors

mponza avatar

Watchers

 avatar

Forkers

grandimk

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.