Coder Social home page Coder Social logo

paulfedorow / fim Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 1.0 2.9 MB

fim is a collection of some popular frequent itemset mining algorithms implemented in Go.

License: MIT License

Go 98.41% Makefile 1.59%
association-analysis frequent-itemset-mining frequent-pattern-mining apriori eclat fp-growth apriori-algorithm eclat-algorithm fp-growth-algorithm

fim's Introduction

fim

Build Status Go Report Card License

fim is a collection of some popular frequent itemset mining algorithms implemented in Go.

fim contains the implementations of the following algorithms:

  • apriori: Based on Fast Algorithms for Mining Associations Rules. As an optimization, the approach to determine the support of the candidates with a hash-tree and a transaction bitmap was replaced with an approach based on a trie. As a further optimization, the generation of candidates of length 2 is omitted in favor of counting each occurring item pair in a 2D map.
  • eclat: As described in Scalable Algorithms for Association Mining. The paper describes and analyses several approaches. The eclat algorithm is implemented based on the approach named Eclat. For the determination of frequent itemsets of length 2, a 2D map is used instead of a 2D array as described in the paper.
  • fpgrowth: Based on Mining Frequent Patterns without Candidate Generation. Implemented mostly as described in the paper. The only difference is that there is no special-casing implemented for FP-trees that are paths.

Build

Execute the following commands to build the fim tool:

git clone https://github.com/paulfedorow/fim.git
cd fim
make

Usage

To see which arguments fim supports execute the following command:

build/fim -h

The following example finds all frequent itemsets with a minimal support of 1% in the dataset contained in datasets/retail.dat:

build/fim -a fpgrowth -i datasets/retail.dat -s 0.01 

Dataset format

fims dataset file format is as follows:

File = {Transaction}
Transaction = Item {" " Item} "\n"
Item = {"0" ... "9"}

Each line in the file is a transaction. A line is expected to be a series of integers separated by a single space. Each integer is an item of the corresponding transaction.

Performance

To determine which algorithm is the most efficient, the runtime of each algorithm was measured with different datasets and decreasing minimal support. The datasets that were used are retail.dat and chess.dat from FIMI Dataset Repository. The datasets are respectively sparse and dense. The results are shown below. The algorithm fpgrowth is best in terms of runtime.

Runtime chart for datasets/retail.dat

Runtime chart for datasets/chess.dat

fim's People

Contributors

paulfedorow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

davidlam-winc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.