Coder Social home page Coder Social logo

fp-growth's People

Contributors

evandempsey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fp-growth's Issues

the output of rule arrays can not find in patterns ,and so many of them turn to 1

patterns = fpg.find_frequent_patterns(transactions, 399)
print "patterns:",patterns
rule = fpg.generate_association_rules(patterns,0.23)
print "rule:",rule
patterns: {(11623791,): 718, (1200977055, 1201023876, 1201116375, 1201268126, 1201562424): 805, (1201207701,): 2217, (1201451040,): 445, (1201345366,): 1844, (1201294590,): 3846, (1200745609,): 1229, (1201023876, 1201116375, 1201562424): 830, (1201532784,): 1535, (1201486603,): 2022, (11738534,): 3125, (1201350860,): 476, (1200977055, 1201268126): 962, (1201557016,): 1151, (1201542715,): 1785, (1201023876, 1201116375, 1201268126): 816, (1200419767,): 656, (1201457994,): 1677, (1201268126,): 1320, (1201113796,): 503, (1200891159,): 1929, (1201322512,): 1538, (10104273,): 861, (1201613420,): 4840, (1200354823,): 1541, (1200432941,): 497, (1201305683,): 406, (1200457081,): 501, (10814383,): 2000, (1201499030,): 1498, (1201052503,): 2485, (1201157090,): 1463, (1200163273,): 555, (1201631999,): 3614, (11249445,): 1426, (1200961536,): 2807, (1201383817,): 617, (1201467922,): 3863, (1200427431,): 658, (1200790003,): 3589, (1200982196,): 2010, (1201562424,): 3043, (1201273973,): 1382, (1201503232,): 2539, (1201394998,): 583, (1200159324,): 630, (1201570908,): 1860, (1201403614,): 3257, (1201273966,): 1145, (1201023876, 1201116375): 830, (1200977055, 1201023876, 1201116375, 1201562424): 812, (1201477137,): 2218, (1200333505,): 1286, (1200437714,): 1706, (1200743214,): 3354, (1201513338,): 2445, (1201474472,): 2813, (1201322043,): 762, (1200885766, 1201352808): 478, (10647548,): 568, (1201094551,): 686, (1201121314,): 518, (1201462793,): 2257, (1201578718,): 18168, (1201202122,): 873, (1201140837,): 499, (1201345317,): 2145, (1201462348,): 1218, (1200297191,): 4166, (1201503232, 1201613420): 1306, (1201494873,): 917, (1201494927,): 614, (1200786101,): 515, (1200974125,): 930, (1200697819,): 1062, (1201392758,): 735, (1201164589, 1201273973): 518, (1200957738,): 2042, (1201271746,): 2369, (1200385629,): 2088, (1201138670,): 2535, (1201392771,): 612, (1201054494,): 2307, (1201526185,): 1783, (11204831,): 4291, (1201395562,): 992, (10643879,): 1706, (1200335185,): 449, (1200691591,): 1967, (1201429522,): 760, (1200242349,): 2003, (1201504568,): 955, (1200283758,): 479, (1201023876, 1201116375, 1201268126, 1201562424): 816, (1201149664,): 5958, (1201349905,): 1444, (1200172387,): 1984, (1201208643,): 2195, (1200977055, 1201023876, 1201116375, 1201268126): 805, (1201352808, 1201457994): 451, (1201169598,): 2809, (1201518719,): 482, (1201522186,): 1887, (1200708773,): 1107, (11146699,): 3246, (1201470054,): 792, (1200977055, 1201023876, 1201562424): 935, (1201216728,): 1167, (1200371059,): 1981, (11708313,): 549, (1200352602,): 1639, (1201446427,): 737, (1201123510,): 536, (1200311105,): 666, (1200212508,): 837, (1201468581,): 5954, (1201578729,): 550, (1201213230,): 4866, (1200812613,): 3369, (1201027792,): 552, (1201265492,): 490, (1200301742,): 513, (1200455993,): 2665, (1201270886,): 3641, (11702895,): 1040, (1200735969,): 490, (1201116375, 1201268126, 1201562424): 823, (1201349181,): 519, (1201432931,): 778, (1200785918,): 511, (1200250761,): 402, (1201424420,): 2467, (1201332927,): 2825, (10386250,): 513, (1201343088,): 409, (1201421617,): 421, (1201023876, 1201562424): 1291, (1200777787,): 973, (1200106637,): 1013, (1201268126, 1201562424): 1314, (11006454,): 659, (1201234460,): 475, (1200799042,): 400, (1201237981,): 1253, (1201352808,): 9725, (1201079811,): 720, (1200961550,): 3528, (1201527994,): 2485, (1200977055, 1201562424): 1014, (1201393734,): 481, (1200404231,): 640, (1201088308,): 715, (1201116375, 1201562424): 844, (1201564142,): 815, (1201148486,): 1044, (11144143,): 1910, (11240917,): 1220, (1200277793,): 1722, (1200415164,): 1925, (11253335,): 8420, (1201557730,): 419, (1201116375, 1201268126): 823, (1200228384,): 1684, (1200423973,): 1601, (1201097392,): 448, (1201284505,): 1586, (1201033228,): 497, (1201461902,): 622, (1200223668,): 577, (1200977055, 1201268126, 1201562424): 959, (1201352808, 1201532784): 472, (1200450459,): 2716, (1200368622,): 631, (1201270977,): 480, (11676618,): 1816, (1201079287,): 8783, (11121832,): 571, (1201096003,): 1002, (1201406468,): 683, (1200977055, 1201116375, 1201562424): 815, (1201172110,): 399, (1201558444,): 2911, (1201023876, 1201268126, 1201562424): 1221, (1200220152,): 6894, (11161428,): 541, (1201287663,): 781, (1201466389,): 456, (1201357052,): 2881, (1201306914,): 1849, (1200084847,): 1208, (1201492862,): 1546, (1201487585,): 2581, (1201443312,): 427, (1201478539,): 978, (1201328361,): 1158, (1200301745,): 2233, (1201286776,): 2917, (1200285785,): 706, (1200977055, 1201116375, 1201268126, 1201562424): 808, (10447424,): 6067, (1200977055, 1201023876, 1201268126): 896, (1200982785,): 601, (1201390492,): 1768, (1201164589, 1201273966): 429, (1200910312,): 613, (1200290334,): 814, (1201030623,): 2313, (1200977055, 1201023876, 1201116375): 812, (1200869612,): 1898, (1201187602,): 1308, (1200885766,): 2558, (1201341673,): 1448, (1201504572,): 592, (1200377480,): 3378, (11501859,): 2479, (1201363125,): 2223, (1201577226,): 2591, (1201536832,): 1903, (1200904137,): 1108, (1201093617,): 1401, (1200977055, 1201023876, 1201268126, 1201562424): 895, (11698907,): 1880, (11223667,): 2141, (1201065998,): 527, (1200743220,): 626, (1200991695,): 646, (1200070389,): 654, (1201023876, 1201268126): 1222, (1200689921,): 3034, (1201492855,): 1386, (1201502466,): 2214, (11240936,): 650, (1201577961,): 513, (1200279428,): 456, (1200979992,): 2950, (1201338800,): 3385, (11567723,): 532, (10973190,): 494, (1200743212,): 799, (1201592045,): 5235, (1200977055, 1201116375, 1201268126): 808}
rule: {(1201023876, 1201116375, 1201268126): ((1200977055,), 0.9865196078431373), (1201116375, 1201268126, 1201562424): ((1200977055,), 0.9817739975698664), (1200977055, 1201023876, 1201268126, 1201562424): ((1201116375,), 0.8994413407821229), (1201532784,): ((1201352808,), 0.30749185667752443), (1201023876, 1201562424): ((1200977055, 1201268126), 0.6932610379550735), (1200977055, 1201268126): ((1201116375,), 0.83991683991684), (1201268126, 1201562424): ((1200977055, 1201023876), 0.6811263318112634), (1200977055, 1201023876, 1201268126): ((1201562424,), 0.9988839285714286), (1200977055, 1201116375, 1201268126, 1201562424): ((1201023876,), 0.9962871287128713), (1201268126,): ((1200977055, 1201116375), 0.6121212121212121), (1200977055, 1201023876, 1201116375): ((1201268126,), 0.9913793103448276), (1200977055, 1201562424): ((1201023876, 1201268126), 0.8826429980276134), (1201613420,): ((1201503232,), 0.26983471074380166), (1201562424,): ((1200977055, 1201023876, 1201268126), 0.29411764705882354), (1201116375, 1201562424): ((1200977055, 1201268126), 0.957345971563981), (1201273966,): ((1201164589,), 0.3746724890829694), (1201023876, 1201116375, 1201268126, 1201562424): ((1200977055,), 0.9865196078431373), (1200977055, 1201023876, 1201116375, 1201268126): ((1201562424,), 1.0), (1201273973,): ((1201164589,), 0.3748191027496382), (1201023876, 1201116375, 1201562424): ((1201268126,), 0.983132530120482), (1200977055, 1201023876, 1201562424): ((1201268126,), 0.9572192513368984), (1200977055, 1201268126, 1201562424): ((1201023876,), 0.9332638164754953), (1201116375, 1201268126): ((1200977055,), 0.9817739975698664), (1201503232,): ((1201613420,), 0.5143757384797164), (1201457994,): ((1201352808,), 0.2689326177698271), (1200977055, 1201116375, 1201562424): ((1201268126,), 0.9914110429447853), (1201023876, 1201116375): ((1200977055,), 0.9783132530120482), (1200977055, 1201023876, 1201116375, 1201562424): ((1201268126,), 0.9913793103448276), (1201023876, 1201268126, 1201562424): ((1200977055,), 0.733005733005733), (1200977055, 1201116375, 1201268126): ((1201562424,), 1.0), (1201023876, 1201268126): ((1200977055, 1201562424), 0.7324058919803601)}

Missing patterns

When using find_frequent_patterns(), certain itemsets are missing in the patterns although they have frequence above the threshold support provided, hence also causing certain rules to be missing.
For example, consider the following dataset:

Transaction ID Items_bought
T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y}
T300 {M, A, K, E}
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I, E}

I used the following code to find frequent patterns:

import pyfpgrowth as fp
items_bought = [['M','O','N','K','E','Y'], ['D','O','N','K','E','Y'], ['M','A','K','E'], 
                ['M','U','C','K','Y'], ['C','O','O','K','I','E']]
min_support_count = 3
min_conf = 0.8
freq_patterns = fp.find_frequent_patterns(items_bought, min_support_count)
print(freq_patterns)

I get the following output:

{('M',): 3, ('K', 'M'): 3, ('Y',): 3, ('K', 'Y'): 3, ('O',): 4, ('K', 'O'): 4, ('E', 'O'): 4, ('E', 'K'): 4, ('E', 'K', 'O'): 4, ('K',): 5}

This output is missing the itemset ('E',) with frequency of 4 (> threshold of 3).
Due to this, two rules are also missing when generating association rules.
Please look into the issue!!

Avoid recursion when inserting

This function can trivially be rewritten to avoid recursion:

def insert_tree(self, items, node, headers):
"""
Recursively grow FP tree.
"""
first = items[0]
child = node.get_child(first)
if child is not None:
child.count += 1
else:
# Add new child.
child = node.add_child(first)
# Link it to header structure.
if headers[first] is None:
headers[first] = child
else:
current = headers[first]
while current.link is not None:
current = current.link
current.link = child
# Call function recursively.
remaining_items = items[1:]
if len(remaining_items) > 0:
self.insert_tree(remaining_items, child, headers)

Rather than recurse, loop over the items:

def insert_tree(self, items, node, headers): 
    """Grow FP tree iteratively"""
    for item in items:
        child = node.get_child(item) 
        if child is not None: 
            child.count += 1 
        else: 
            # Add new child. 
            child = node.add_child(item) 

            # Link it to header structure. 
            if headers[item] is None: 
                headers[item] = child 
            else: 
                current = headers[item] 
                while current.link is not None: 
                    current = current.link 
                current.link = child 

        # continue inserting from the child node
        node = child

All that the recursive call does is to use child as the node value for the next item in the items sequence. Just set node to child at the end of a loop instead.

Minimum support scale

in doc, you write patterns = pyfpgrowth.find_frequent_patterns(transactions, 2) , 2 is minium support right ? in my knowledge, FP-growth has minimum support and minimum confidence in % , my question is, what scale of minimum support ? thx

The count of root is missing

Frequency for only one element is not counted in function zip_patterns. It can be fixed by adding frequency of root in the return value of that function. Btw really nice code, well organized and very clean.

Items missing in the output

`import pyfpgrowth

item_list = [[11, 13],
[4, 12, 13],
[4, 13, 14, 17],
[7, 11, 13, 14, 17],
[2, 4, 13, 14],
[13, 14],
[2, 7, 11, 13, 14],
[4, 13, 14, 15],
[6, 11, 13],
[11, 13],
[11, 13, 14, 17],
[7, 11, 13, 14],
[2, 11, 12, 13, 14],
[7, 11, 13],
[11, 13],
[7, 8, 13, 14, 17],
[0, 7, 11, 13],
[2, 11, 13, 14, 15],
[7, 11, 12, 13, 14],
[11, 12, 13, 14]]

patterns = pyfpgrowth.find_frequent_patterns(item_list, 1)`

My code and dataset is defined as above. The problem is when I try patterns = pyfpgrowth.find_frequent_patterns(item_list[:15], 1), I can find the key (7,) in patterns, but when I use the whole item_list, key(7,) is missing. Is this a bug or I get something wrong with the algorithm?

Thanks.

MemoryError while running on a 50 item transactions

I have a list of 30 transactions with up to 50 items per transaction. I am having the following MemoryError problem. I am running on Python 3.6, my PC has 4GB of RAM.

Is there any solution to this? To make it work with this number of items per transactions?

Traceback (most recent call last): File "C:/Users/Edoniti/PycharmProjects/Project/fp-growth/fp-growth1.py", line 15, in <module> patterns = pyfpgrowth.find_frequent_patterns(transactions, 2) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 253, in find_frequent_patterns return tree.mine_patterns(support_threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 155, in mine_patterns return self.zip_patterns(self.mine_sub_trees(threshold)) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 235, in mine_sub_trees subtree_patterns = subtree.mine_patterns(threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 155, in mine_patterns return self.zip_patterns(self.mine_sub_trees(threshold)) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 235, in mine_sub_trees subtree_patterns = subtree.mine_patterns(threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 155, in mine_patterns return self.zip_patterns(self.mine_sub_trees(threshold)) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 235, in mine_sub_trees subtree_patterns = subtree.mine_patterns(threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 153, in mine_patterns return self.generate_pattern_list() File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 193, in generate_pattern_list min([self.frequent[x] for x in subset]) MemoryError

Different rules on regenerating

I get different rules, each time I run the code.

transaction = [['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14',\
 '15', '16', '7', '17', '18', '19', '20', '21', '7', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '29', '30'], ['32', '33'],\ 
['34', '34', '34', '34', '34', '34', '34', '34', '35', '35', '36', '36', '34', '34', '37', '37', '37', '37',\
 '37', '37', '34', '34', '29', '30', '36', '36', '36', '36', '36', '36']]

import pyfpgrowth as fg
patterns = fg.find_frequent_patterns(transaction, 3)
rules = fg.generate_association_rules(patterns, 0.1)

Can anyone see it ?

Rules Being Overwritten

Hi All,
In working on this issue for my own work I found a problem with the generate_association rules module specifically with the line below

if confidence >= confidence_threshold:
rules[antecedent] = (consequent, confidence)

This only allows effectively one rule per item(antecedent) and thus the higher order rules per item where they exist are essentially over riding the simple 1 -> 1 items.
I addressed this by changing the code so that instead of setting the rules dict value each time - if a rule already exists for an item(antecendent)- I then append the new rule as a list to the dict value for that item as follows

if confidence >= confidence_threshold:
rule1=(consequent, confidence)
rule1=list(rule1)
if antecedent in rules:

                    rules[antecedent].append(rule1)
                else:
                    rules[antecedent]=rule1

The rules of course then have to be unravelled in a slightly more complex manner but this worked well .
First rule is value[0] -> value[1] . Subsequent rules are stored as list in value[n] so the unravelling is as follows where prel is the antecedent list ,postl is the consequent list and confl is the confidence list

for key,value in rules.iteritems():

if len(value)> 2: #If item has more than 1 rule 
    for i in range(2,len(value)):# For rule 2 and subsequent rules 
        prel.append(key)
        postl.append(value[i][0])
        confl.append(value[i][1])

prel.append(key)
postl.append(value[0])
confl.append(value[1])

I'm sure there's a neater way of resolving this issue but it is an important constraint on the scope of the algorithm . The algorithm is super fast and excellent otherwise and a pity if not being used for this reason . Comments welcome - would be great if the code could be updated.

MemoryError while running on transactions

I am having the following MemoryError problem. I am running on Python 3.6. Is there any solution to this?


MemoryError Traceback (most recent call last)
in
----> 1 patterns = pyfpgrowth.find_frequent_patterns(tweets_all_day_fp, 3)
2
3 print("Patterns :", patterns)

~\PycharmProjects\Birlestirme_New_2\pyfpgrowth.py in find_frequent_patterns(transactions, support_threshold)
251 """
252 tree = FPTree(transactions, support_threshold, None, None)
--> 253 return tree.mine_patterns(support_threshold)
254
255

~\PycharmProjects\Birlestirme_New_2\pyfpgrowth.py in mine_patterns(self, threshold)
151 """
152 if self.tree_has_single_path(self.root):
--> 153 return self.generate_pattern_list()
154 else:
155 return self.zip_patterns(self.mine_sub_trees(threshold))

~\PycharmProjects\Birlestirme_New_2\pyfpgrowth.py in generate_pattern_list(self)
191 pattern = tuple(sorted(list(subset) + suffix_value))
192 patterns[pattern] =
--> 193 min([self.frequent[x] for x in subset])
194
195 return patterns

MemoryError:

the threshold in find_frequent_patterns is not working

find_frequent_patterns(transactions, threshold) is not affected by the threshold value. It returns the same dictionary (same itemsets) when I set it to 0.0 or 1.0.

It should return an empty dict for 0.0

Basically, find_frequent_patterns(my_transactions, 0.0) == find_frequent_patterns(my_transactions, 1.0) is True.

Is this how it is supposed to be used?

EDIT: Figured it out! Works just fine!

can not output when applying to latitude

OS: Ubuntu 16.04 64bit
fpgrowth version 1.0, installed by pip install pyfpgrowth in virtualenv python3.5
I want use pyfpgrowth to mining trajectory latitude data, such as 41.804068
41.802275
41.802275
41.802383
41.802383
41.802383
41.802756
41.802756
41.802756
41.802756
41.802756
41.802756
41.802756
41.80275
41.80275
41.80275
41.80275
41.80275
41.80275
input:8 pieces of gps location latitude data, with length 52, 44,39,72, 41, 40, 164, 189.
program is
patterns = pyfpgrowth.find_frequent_patterns(transactions, 2)
rules = pyfpgrowth.generate_association_rules(patterns, 0.7)
It used out 32GB memory, run in single thread, lasting 16 hours and still can not output , then I kill the process. Could anyone tell me how to do? I would appreciate for your valuable suggestion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.