mhahsler / arules Goto Github PK

View Code? Open in Web Editor NEW

188.0 15.0 41.0 9.61 MB

Mining Association Rules and Frequent Itemsets with R

Home Page: http://mhahsler.github.io/arules

License: GNU General Public License v3.0

R 56.37% C 41.58% TeX 2.05%

arules cran frequent-itemsets association-rules r

arules's Issues

Why apriori telling me attempt to apply non-function

I'm trying to do a market basket analysis using arules package. However, when I was using the apriori algorithm, R reported the following message.

dt <- split(deviceshowlist$prog_title, deviceshowlist$device_id)
dt2 <- as(dt,"transactions")
rules <- apriori(dt2, parameter = list(support = 0.01, confidence = 0.05, minlen=2))

Apriori
Parameter specification:
Error in print.default(parameter) : attempt to apply non-function

I looked at my transaction data structure and the function of apriori. It seems that there's nothing wrong. Kindly hope that someone ran into the similar question before and could help me with it.

Thanks

Capture output of inspect as a data frame

Can we have a feature that captures the output of inspect into a data frame? Here is a particular use case: https://stackoverflow.com/questions/50554355/capture-the-output-of-arulesinspect-as-data-frame

`show` should be used instead of `print` for S4 objects

When printing an S4 object to the console, such as APparameter, the show() function should be used rather than print().

This line (and possibly others) violate this, which can cause problems. Read the "Show method" section here for more details.

Why only 1 item on the right-hand-side of the rules?

Dear Michael,

May I know why arules only generate rules with only 1 item on the right-hand-side? I have seen you have answered in

http://r.789695.n4.nabble.com/how-to-change-number-of-itemes-appeare-in-right-hand-side-of-the-rule-with-apriori-in-R-arules-td1559955.html

But do you mind to explain a bit more? If only rules with 1 item on the right-hand-side are generated, does it mean a lot of rules will be neglected (rules with >1 item on the right-hand-side) and also associations/patterns between items?

Is a rule with more than 1 item on the RHS producing less value (less useful) than the rules with only 1 item on the RHS so it can be neglected?

Many thanks for your help.

is.redundant: Error: cannot allocate vector of size 250.0 Gb

Hi, My arulesModel model contains 260K rules and has a size of 17 MB. Upon applying the is.redundant method, memory limit is reached on a machine with 16 GB RAM:

is.redundant(arulesModel)

Error: cannot allocate vector of size 250.0 Gb
In addition: Warning messages:
1: In .local(x, y, proper, sparse, ...) : Reached total allocation of 16274Mb: see help(memory.size)
2: In .local(x, y, proper, sparse, ...) : Reached total allocation of 16274Mb: see help(memory.size)
3: In .local(x, y, proper, sparse, ...) : Reached total allocation of 16274Mb: see help(memory.size)
4: In .local(x, y, proper, sparse, ...) : Reached total allocation of 16274Mb: see help(memory.size)

memory.limit()
[1] 16274 # => 16 GB

How to remove redundant rules without hitting the memory limit? Thanks!

Rules object-size increases upon using subset or sort: possible memory leak?

I noticed that the object size increased for the rules after performing any action on it, like subsetting the RHS or sorting. For example:

object.size(arulesModel)
#16908112 bytes # ~ 17 MB

arulesModelSubset <- subset(arulesModel, subset = rhs %in% cnames)

object.size(arulesModelSubset) #49910224 bytes # 50 MB compared to 17 MB above

arulesModelSorted <- sort(arulesModel, by = "lift")
object.size(arulesModelSorted)
#50946536 bytes # 51 MB compared to 17 MB above.

Is this expected or is there possibly a memory leak?
Thanks!

Can't coerce to transaction using as (df, "transactions") when executed with Rscript

Hi,
I'm trying to run the following script:
data <- list(
c("a","b","c"),
c("a","b"),
c("a","b","d"),
c("b","e"),
c("b","c","e"),
c("a","d","e"),
c("a","c"),
c("a","b","d"),
c("c","e"),
c("a","b","d","e")
)
as(data, "transactions")

through command line Rscript and get an error:
Error: could not find function "as"
Execution halted

Apriori produces 0 rules for large number of observations

In some of my Apriori runs, I get 0 rules. The trouble seems to be with the number of observations, at least how I'm using it.

Here's my R script to reproduce the problem. smallRules are calculated properly. However, largeRules remain 0 whenever largeObsCount is above 250. I'm actually not sure where the sweet spot is (200 is OK). I was narrowing it down, but unfortunately random.org won't let me run any more tests today. I had reported this on StackOverflow in a comment, but in fact I wasn't realizing that there were 0 rules.

if(! "arules" %in% installed.packages()) install.packages("arules", depend = TRUE)
library (arules)
if(! "random" %in% installed.packages()) install.packages("random", depend = TRUE)
library(random)

smallItemCount <- 24
smallSampleNames <- as.vector(randomStrings(n=smallItemCount, len=10, unique=TRUE))
shortSamplePaths <- rep("src/", smallItemCount)
smallTmpData <- data.frame(paths=shortSamplePaths,names = smallSampleNames)
smallSampleItems <- interaction(smallTmpData[head(names(smallTmpData))], sep= "")

smallObsCount = 500
smallSampleData <- data.frame(
  X = sample(smallSampleItems, smallObsCount, replace = TRUE),
  Y = sample(smallSampleItems, smallObsCount, replace = TRUE)
)

smallRules <-apriori(smallSampleData, parameter=list(supp=0.005,conf=0.1,minlen=2))

largeItemCount = 578
largeSampleNames <- as.vector(randomStrings(n=largeItemCount, len=10, unique=TRUE))
#longSamplePaths <- rep("modules/junit4/src/test/java/org/powermock/modules/junit4/", largeItemCount)
longSamplePaths <- rep("junit4/", largeItemCount)
bigTmpData <- data.frame(paths=longSamplePaths,names = largeSampleNames)
bigSampleItems <- interaction(bigTmpData[head(names(bigTmpData))], sep= "")

largeObsCount = 250
bigSampleData <- data.frame(
  X = sample(bigSampleItems, largeObsCount, replace = TRUE),
  Y = sample(bigSampleItems, largeObsCount, replace = TRUE)
)

bigRules <-apriori(bigSampleData, parameter=list(supp=0.005,conf=0.1,minlen=2))

Negative Odds Ratios

I recently came across a rule with a huge negative Odds Ratio:

> rules[69804] %>% interestMeasure(transactions, "oddsRatio") 
[1] -5.954607e+20

I guess it's hard to replicate without supplying my whole dataset, but i tried to track the reason down;
In the function .getCounts, the variable f01 becomes negative.
It is created by subtracting fx1 - f11, i.e.

.rhsSupport(x, transactions, reuse) * N - interestMeasure(x, "support", transactions, reuse) * N

or as I (hopefully correctly?) interpret it, supp(Y) - supp(X=>Y), which should never become negative.

Here are the fully printed raw numbers, where you can see that f11 is in fact represented larger than fx1.

> sprintf("%.66f",.rhsSupport(x, transactions, reuse))
[1] "0.000092340537126436346296656787480117145605618134140968322753906250"
> sprintf("%.66f",interestMeasure(x, "support", transactions, reuse))
[1] "0.000092340537126436359853520752238864588434807956218719482421875000"

I guess the error happens somewhere at a lower level like arules::support() or even at the C Level, but I didn't track it down any further. It probably is an R floating point rounding error somewhere.
I just wanted to give attention to this event, maybe a simple check could be implemented here and negative numbers replaced by zero with a warning.

Clarification Edit: Of course, only the result of this subtraction should be rounded to zero, so the division yields Inf which is within the definition range for an odds ratio.

discretize function not working when NA in data vector

In previous version of discretize it works well on vectors with NA available. This is not a big issue, but in general this function works differently than the old one. On the same data it does not allow me to use same number of bins, tells to decrease it. So I invoked and use old discretize and cut2 function

Informative output of apriori cannot be disabled with suppressMessages

Arules outputs many diagnostic messages such as Parameter specification and Algorithmic control.

These messages cannot be suppressed with the suppressMessages() function. The gist of the problem might be described in this stackoverflow entry.

As a side note, the warning "You chose a very low absolute support count of 0. You might run out of memory! Increase minimum support." can be supressed with suppressWarnings().

Working suppressMessages() would be nice when apriori is invoked programmatically, from other packages.

Something wrong with the is.redundant function

I was trying to remove redundant rules from my association rules (total 3xxx rules) and after the use of the is.redundant function:
inspect(rules[!is.redundant(rules)])
It turns out just 54 rules are not redundant. I was quite shock, and then I used another example to test which is here: http://www.rdatamining.com/examples/association-rules
The redundant rules, as shown, should be rules [2], [4], [7], and [8], but I used the is.redundant function and it showed that only rules [4] and [8] are NOT redundant which is totally wrong apparently.

Is there anything wrong with the function or have I misused it?

interpretation the plot of association rules(MBA) in R

I performed association Rules

trans <-  read.transactions("C:/Users/synthex/Downloads/Groceries.csv", format = 'basket', sep = ',')
rules <- apriori(trans, parameter = list(supp = 0.001, conf = 0.8))
rules <- apriori(trans, parameter = list(supp = 0.001, conf = 0.8,maxlen=3))
plot(rules,method="graph",engine='interactive',shading=NA)

Then i got this plot

https://i.stack.imgur.com/3jr7x.jpg

What does mean the red circles and is it possible to give lable them?

mydat

structure(list(chocolate = structure(c(9L, 13L, 1L, 8L, 16L, 
    2L, 14L, 11L, 7L, 15L, 17L, 5L, 10L, 4L, 3L, 6L, 2L, 18L, 12L
    ), .Label = c("bottled water", "canned beer", "chicken,citrus fruit,tropical fruit,root vegetables,whole milk,frozen fish,rollsbuns", 
    "chicken,pip fruit,other vegetables,whole milk,dessert,yogurt,whippedsour cream,rollsbuns,pasta,soda,waffles", 
    "citrus fruit,pip fruit,root vegetables,other vegetables,whole milk,cream cheese ,domestic eggs,brown bread,margarine,baking powder,waffles", 
    "frankfurter,citrus fruit,onions,other vegetables,whole milk,rollsbuns,sugar,soda", 
    "frankfurter,rollsbuns,bottled water,fruitvegetable juice,hygiene articles", 
    "frankfurter,sausage,butter,whippedsour cream,rollsbuns,margarine,spices", 
    "fruitvegetable juice", "hamburger meat,other vegetables,whole milk,curd,yogurt,rollsbuns,pastry,semi-finished bread,margarine,bottled water,fruitvegetable juice", 
    "meat,citrus fruit,berries,root vegetables,whole milk,soda", 
    "packaged fruitvegetables,whole milk,curd,yogurt,domestic eggs,brown bread,mustard,pickled vegetables,bottled water,misc. beverages", 
    "pickled vegetables,coffee", "root vegetables", "tropical fruit,margarine,rum", 
    "tropical fruit,pip fruit,onions,other vegetables,whole milk,domestic eggs,sugar,soups,tea,soda,hygiene articles,napkins", 
    "tropical fruit,root vegetables,herbs,whole milk,butter milk,whippedsour cream,flour,hygiene articles", 
    "turkey,pip fruit,salad dressing,pastry"), class = "factor")), .Names = "chocolate", class = "data.frame", row.names = c(NA, 
    -19L))

ERROR in apriori parameter: appearance

Hi, when I train the model as follows:

traintrans <- as(traindata.data.frame, "transactions")
rules <- apriori(traintrans, parameter = list(supp=0.001, minlen = 2, maxlen=5, conf = 0.5, target = "rules")
, appearance = list(rhs = cnames))

where cnames <- colnames(traindata.data.frame)[7:9], that is, I would like to train the rules only for recommendations from the list "cnames", why do some extra recommendations also trickle in the resulting output? Is not the parameter "appearance" supposed to control which set of values one is interested in getting recommendations for?

Thanks!

apriori code with appearance class

Hi:

Was wondering how I could look for a particular word in RHS of appearance class in the apriori code.

rulesD<-apriori(data = txn, parameter=list(supp=0.0005,conf = 0.01),
appearance = list(default="lhs",rhs = grep("DATE|DATES", rhs, value = TRUE)),
control = list(verbose=F))

Here I need a list of products on RHS that contains the word "DATE" or "DATES".
rhs = grep("DATE|DATES", rhs, value = TRUE)) in the code is resulting in error.

Following the below example:

rules<-apriori(data=dt, parameter=list(supp=0.001,conf = 0.8), appearance = list(default="lhs",rhs="Banana"), control = list(verbose=F))

Only the products that only contains "BANANA"is returned. However, "ORGANIC BANANA" is not returned in RHS.

Sorry if this is a simple question! :) Searched the cran doc, did not find a solution.

Thank you for your help.

kappa and leastContradiction

My name is Feng, I am using your arules packages 1.5.2 to find the related products in my retail data. Now, I am confused about two measures in the function interestMeasure: kappa and leastContradiction.

In the package manual, there is a piece of code of explaining how to use interestMeasure. I change the code a little bit:

data("Income") rules <- apriori(Income) quality(rules)$kappa <- interestMeasure(rules,measure='kappa',transactions = Income) quality(rules)$leastContradiction <- interestMeasure(rules,measure='leastContradiction',transactions = Income) try <- as(rules,'data.frame')

Then, we can see the ranges of these two measures are:

summary(try$leastContradiction)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.08794 0.13920 0.17000 0.18930 0.22170 0.90460
summary(try$kappa)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-43160000 -20510000 -19140000 -17660000 -12220000 -8042000

You can see the range of kappa is so different from what the manual describes: [-1,1]

When I use these two measures on my own data, I have:

summary(myData1$kappa)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-5767000000000 -5765000000000 -5756000000000 -5745000000000 -5728000000000 -5610000000000
summary(myData1$leastContradiction)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-218.9000 -5.4530 -2.0120 -4.9540 -1.1050 0.8824

Could you please explain to me how to use these two measures?
Thanks a lot

Feng

Access/get value from sequence

I use arulesSequences and i want to access/get value from sequence every levels, for example:

x <- read_baskets(file.choose(), info = c("sequenceID","eventID","SIZE"))
as(x,"data.frame")
s0 <- cspade(x, parameter = list(support = 0, maxsize = 5, maxlen = 5))
seq = as(s0, "data.frame")

variable seq returned this:

1 <{257}> 0.02777778
2 <{259},{305}> 0.02777778
3 <{259},{305}> 0.02777778
4 <{259},{305}> 0.02777778
5 <{259},{305}> 0.02777778
...
...
...

27 <{292},{305}> 0.02777778
28 <{293},{305}> 0.02777778
29 <{292},{293},{305}> 0.02777778
30 <{290},{293},{305}> 0.02777778
31 <{259},{293},{305}> 0.02777778
32 <{290},{292},{293},{305}> 0.02777778
33 <{259},{292},{293},{305}> 0.02777778

i want to get the first and last value of sequence per levels, something like this:
1 : first = last = 257
2 : first = 259, last = 305
29 : first = 292, last = 305
33 : first = 259, last = 305

is that possible?

parameter appearance in apriori

I am not sure if there is an error in parameter appearance (or I don not understand the meaning).

I execute:

Download dataset from

http://www.rdatamining.com/data/titanic.raw.rdata

Save in local directory of R

load("./titanic.raw.rdata")
library(arules)
rules <- apriori(titanic.raw, control = list(verbose=F),
parameter = list(supp=0.002, conf=0.01),
appearance = list( rhs=c("Survived=Yes"),
lhs=c("Class=1st", "Class=2nd", "Class=3rd",
"Age=Child", "Age=Adult")))
inspect(rules)

12 rules

rules2 <- apriori(titanic.raw, control = list(verbose=F),
parameter = list(supp=0.002, conf=0.01),
appearance = list( rhs=c("Survived=Yes"),lhs=c("Class=1st")))
inspect(rules2)

2 rules

My concern is about. rules1 has 12 rules with some of the attributes in:
---> lhs=c("Class=1st", "Class=2nd", "Class=3rd",
"Age=Child", "Age=Adult")

But with the second call to apriory I put a less restrictive set of conditions to apriori:
--->,lhs=c("Class=1st")

My question is why the second call generate less rules?

For instance in the first order the rule
[8] {Class=1st,Age=Child} => {Survived=Yes} 0.002726034 1.0000000
is generated

But in the second order this rule is not generated. And it has Class=1st on the left.

Could you help me?

Thank you very much in advance

ANGEL MORA BONILLA
University of Málaga, Spain

is.closed invalid count

when applying is.closed to an itemset class, sometimes I will get an error of "invalid count". This seems to occur only when the itemset has a support count of 1 but it does not occur with every itemset with a support count of 1. I'm trying to find why this occurs.

%in% for itemMatrix, ItemMatrix broken

Example from rwdvc returns all transactions!

require(arules)
data("Adult")
## Mine association rules.
rules <- apriori(Adult, 
                 parameter = list(supp = 0.5, conf = 0.9, target = "rules", minlen = 2))
summary(rules)
sub_rules <- rules[1]
inspect(sub_rules)
sub_trans <- subset(Adult, items %in% lhs(sub_rules))

ERROR: is.subset: Error in .local(x, ...) : All item labels in x must be contained in 'itemLabels' or 'match'.

Hi, I get the following error from is.subset, example shown below:

is.subset(as(list(c("salt","water"),c("pepper")), "transactions"), as(list(c("salt","water")), "transactions"))

Error in .local(x, ...) :
All item labels in x must be contained in 'itemLabels' or 'match'.

The above error does not allow me to make predictions for NEW transactions, when I use the is.subset method as:

         rulesMatchLHS <- is.subset(rules@lhs, newtrans)

where "rules" is the model from "aprior" with target = "rules".
I get the above error when the new data ("newtrans") has columns missing from the training data.

Also, if I try using "subset" instead, I get the following error from transactions in newtrans that may contain newer items/values (columns/values) not seen in the training data. The error obtained is as follows:

Error in
subset(rules, subset = rules@lhs %in% LIST(newtrans[1])[[1]], :
Error in rules@lhs %in% LIST(newtrans[1])[[1]] : table contains an unknown item label

Could you please help fix the errors above? Or please point to how I could make predictions using new transactions that may contain new items or missing items from that seen in the training data.

Thanks!

correct converting dataframe into transactions for arules in R

I must performing association rules in R and i found the example
here
http://www.salemmarafi.com/code/market-basket-analysis-with-r/
In this example they work with data(Groceries)
but they gave original dataset Groceries.csv

structure(list(chocolate = structure(c(9L, 13L, 1L, 8L, 16L, 
2L, 14L, 11L, 7L, 15L, 17L, 5L, 10L, 4L, 3L, 6L, 2L, 18L, 12L
), .Label = c("bottled water", "canned beer", "chicken,citrus fruit,tropical fruit,root vegetables,whole milk,frozen fish,rollsbuns", 
"chicken,pip fruit,other vegetables,whole milk,dessert,yogurt,whippedsour cream,rollsbuns,pasta,soda,waffles", 
"citrus fruit,pip fruit,root vegetables,other vegetables,whole milk,cream cheese ,domestic eggs,brown bread,margarine,baking powder,waffles", 
"frankfurter,citrus fruit,onions,other vegetables,whole milk,rollsbuns,sugar,soda", 
"frankfurter,rollsbuns,bottled water,fruitvegetable juice,hygiene articles", 
"frankfurter,sausage,butter,whippedsour cream,rollsbuns,margarine,spices", 
"fruitvegetable juice", "hamburger meat,other vegetables,whole milk,curd,yogurt,rollsbuns,pastry,semi-finished bread,margarine,bottled water,fruitvegetable juice", 
"meat,citrus fruit,berries,root vegetables,whole milk,soda", 
"packaged fruitvegetables,whole milk,curd,yogurt,domestic eggs,brown bread,mustard,pickled vegetables,bottled water,misc. beverages", 
"pickled vegetables,coffee", "root vegetables", "tropical fruit,margarine,rum", 
"tropical fruit,pip fruit,onions,other vegetables,whole milk,domestic eggs,sugar,soups,tea,soda,hygiene articles,napkins", 
"tropical fruit,root vegetables,herbs,whole milk,butter milk,whippedsour cream,flour,hygiene articles", 
"turkey,pip fruit,salad dressing,pastry"), class = "factor")), .Names = "chocolate", class = "data.frame", row.names = c(NA, 
-19L))

i load this data

g=read.csv("g.csv",sep=";")

so i must convert it to transactions like arule requires

#'@importClassesFrom arules transactions
trans = as(g, "transactions")

lets' examinate data(Groceries)

> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
  .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
  .. .. ..@ Dim     : int [1:2] 169 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame':	169 obs. of  3 variables:
  .. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
  .. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
  .. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
  ..@ itemsetInfo:'data.frame':	0 obs. of  0 variables
>

and my converted data from original csv

> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
  .. .. ..@ p       : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ Dim     : int [1:2] 7011 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame':	7011 obs. of  3 variables:
  .. ..$ labels   : chr [1:7011] "tr=abrasive cleaner" "tr=abrasive cleaner,napkins" "tr=artif. sweetener" "tr=artif. sweetener,coffee" ...
  .. ..$ variables: Factor w/ 1 level "tr": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ levels   : Factor w/ 7011 levels "abrasive cleaner",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..@ itemsetInfo:'data.frame':	9835 obs. of  1 variable:
  .. ..$ transactionID: chr [1:9835] "1" "2" "3" "4" ...
>

We see that in data(Groceries)

transactions in sparse format with
 9835 transactions (rows) and
 169 items (columns)

in my trans data

 9835 transactions (rows) and
 7011 items (columns)

i.e. i got 7011 columns from Groceries.csv, meanwhile in embedded example(169 columns)

Why it is so? How this file convert correct.
I must understand it, cause, i can't work with my file

Unable to subset arules using r flexdashboard

I'm trying to create a flexdashboard with results from the r arules apriori function, so to display associated relations for specific items selected from the pull down menu in the markdown dashboard. When I create function outside the markdown environment I'm able to successfully subset the new product item to the apriori function without problem. However, When I replace the function variable with the reactive function name I get the following error message: "Error in ==: comparison (1) is possible only for atomic and list types"

Using the provided Grocery data, I'm want to the reactive input to feed the selected variable either "whole milk" or "sugar" and output the rules for the lhs specified variable and create a table output. I'm new to dashboards and to arules, and am not sure how to properly subset the rules function under the flexdashboard condition so I can make the rules interactive.

title: "Grocery_Test"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source_code: embed
runtime: shiny

library(flexdashboard)
library(htmlwidgets)
library(htmltools)
library(knitr)
library(arules)
library(arulesViz)
library(datasets)
library(reshape2)
library(scales)

data(Groceries)

rules <- apriori (data=Groceries
                     ,parameter=list (supp=0.001,conf = 0.15,minlen = 2,maxlen=5)
                     ,control = list (verbose=F)  )

Inputs {.sidebar}

selectInput("Product","Product", c("whole milk","sugar"), selected = "whole milk")

###Associated Product

Product <- reactive({input$Product})

renderDataTable({
  #using this direct call works
  #Product<-"whole milk"
  #subrules<- subset(rules, subset=(lhs %in% as.character(Product) & !(rhs %in% as.character(Product))))
  
  #This reactive function produces error: "Error in ==: comparison (1) is possible only for atomic and list types"          
  subrules <- rules[lhs(rules) %pin% as.character(Product()) & !rhs(rules) %pin% as.character(Product())]
     
    rules_conf <- sort (subrules, by=c("confidence"), decreasing=TRUE) # 'high-confidence' rules.

    redundant <- which (colSums (is.subset (rules_conf, rules_conf)) > 1) # get redundant rules in vector
    rules_conf <- rules_conf[-redundant] # remove redundant rules
 
     rules_conf2<-as(rules_conf,"data.frame")

   # split lhs and rhs into two columns
   rules_conf2<-transform(rules_conf2, rules = colsplit(rules, pattern = "=>", names = c("lhs","rhs")))

# convert to character
rules_conf2$rules$lhs <- as.character(rules_conf2$rules$lhs)
rules_conf2$rules$rhs <- as.character(rules_conf2$rules$rhs)

rules_conf3<-data.frame(LHS=rules_conf2$rules$lhs
                        ,RHS=rules_conf2$rules$rhs
                        ,Support=percent(rules_conf2$support)
                        ,Confidence=percent(rules_conf2$confidence)
                        ,Lift=round(rules_conf2$lift,2))
 
 DT::datatable(rules_conf3,rownames=F
               ,options = list(pageLength = 10
                              ,columnDefs = list(list(className = 'dt-center', targets = 2:4
                                                      ,autoWidth = TRUE, searchable = FALSE))))
                })

is.redundant returns the inverse of what its documentation implies

Hello,

The documentation (and function name) of is.redundant suggests that it should return TRUE for rules that are redundant, i.e. rules that have a negative improvement. Instead, it returns TRUE for rules with a positive improvement.

Thanks!

Parameter to control number of rules generated in apriori

Hi, for the apriori algorithm, I use the following parameters:
support, minlen, maxlen, confidence, target ( = "rules"). I am currently using this set to both tune my model as well as limit the size of the model (that is, the number of rules generated).

It would be immensely helpful to have a separate parameter to control the size of the model, for example, something like "maxrules" so that one can fine-tune the model (for better performance) using the above existing parameters as well as create a model that has a controlled number of rules using "maxrules". Right now, if I fine-tune my model using the existing set of parameters, the number of rules becomes too large (sometimes a few million) which results in long model-building time as well as making predictions. This (limiting the size of the apriori object as well as model-tuning) becomes quite of an issue with automating thousands of models.

Is it possible to add such a parameter in the near future?

Thanks!
Supriya

FpGrowth?

Hi,

Are there any plans for including the FP Growth algorithm?

Conversion of sparse count matrix to full matrix

We are currently using the arules package to extract frequent item pairs for usage in a PCA. We have approximately 50000 "sets" (which means there are (50000^2)/2 potential pairs). We wanted to convert the sparse matrix into a full matrix using the following code:

apri.test <- apriori(transactions, 
                     parameter = list(target = "frequent itemsets", supp = support, 
                     minlen = 2, maxlen = 2), control = list(verbose = TRUE))
pairs.matrix <- as.matrix(apri.test@items@data)

However this gave us a memory error, saying that 580 gb are needed to allocate the matrix. Our rough estimate (50000^2 * 40 / 1000000000 = 80 GB) was greatly below this value. Is there a more efficient way integrated in the package to extract this matrix or are we attempting to extract the wrong matrix entirely?

All the best from the WU.

reached CPU time limit error

With the new version of arules (1.5-0) I am sometimes getting error "reached CPU time limit".

In my evaluations on 27 UCI datasets, I get it on waveform-5000 (MDLP-discretized) with the following setting:
confidence = 0.5 , support = 0 , minlen = 2 , maxlen = 4, maxtime=102 (i.e. the new feature turned off). I always get it during 10 fold crossvalidation, but each time on a different fold.

The same setup with the previous version of arules worked fine. This error is distinct from the new warning related to maxtime.

apriori() causes fatal crash of R when appearance of both LHS and RHS is constrained

Using arules 1.5-3 on R 3.3.3, running apriori() causes a fatal crash when the 'appearance' list specifies lhs and rhs items while default is "none".

See my repository for reproducible example.

Code file baskets.R loads the data in assocs.RData, which consists of a data frame of market baskets and two character vectors specifying items to exclude from LHS and RHS when mining associations.
I provided three ready options for specifying the 'appearance' argument: constrain LHS, constrain RHS, constrain both. R crashes when both sides are constrained.

Several inconsistencies regarding the "ralambrodrainy" measure of interest

The measure referred to as "ralambrodrainy" seems to be a spelling mistake regarding the names of the authors that wrote the paper
Diatta, J., Ralambondrainy, H., & Totohasina, A. (2007). Towards a Unifying Probabilistic Implicative Normalized Quality Measure for Association Rules. Studies in Computational Intelligence, 237–250. doi:10.1007/978-3-540-44918-8_10".
During investigating the correlation between rankings of different measures I found that this measure also has a negative or no correlation with most other measures . Especially confidence is highly negatively correlated.
This makes me think that there might be something wrong with the implementation.

Error in transactions-class documentation?

Hi,

I think there is a simple error in the documentation of "transactions-class" when it says:

coerce signature(from = "matrix", to = "transactions"); produces a transactions data set from a binary incidence matrix. The row names are used as item labels and the column names are stores as transaction IDs.

From what I can see in my experiments, rows represent transactions and columns represent items.

Regards,
Víctor

License types for datasets

Hi Michael,

What's the license type for your datasets? is that under public domain? I found your groceries dataset in this repo and it says it's under public domain, so I just want to double check this with you, thanks!

A reference website for common license types here

Frequent Itemsets beyond Level 10 are not checked

Imagine Transactions with sizes of up to 15 Items. With a low support, arules does not check Itemsets of Level 10 correctly. Every Itemset of length 10 and more is always not frequent.

I compared the results with the output of the latest Borgelt Apriori implementation. Every retrieved Frequent Itemset is identical up to Level 9, after which they are simply missing.

interestMeasure does not return a proper dataframe when there is only 1 rule

Hi,

I've realized that using the function interestMeasure with a set of rules of size 1 does not return a proper dataframe.

data("Income")
rules <- apriori(Income)
r1 <- rules[1]
r2 <- rules[1:2]

> interestMeasure(x = r2, measure = c("support", "confidence")) #OK
    support confidence
1 0.9128854  0.9128854
2 0.1127109  0.9292566

> interestMeasure(x = r1, measure = c("support", "confidence")) #Wrong
           sapply(measure, FUN = function(m) interestMeasure(x, m, transactions, reuse, ...))
support                                                                             0.9128854
confidence                                                                          0.9128854

Is this a bug in the function or am i using it badly?

Thanks!
Víctor

interestMeasure - mutualInformation - giving negative values

Was using interestMeasure, and decide to sample some asymmetric measures with symmetric was surprised to see the Mutual Information (M) numbers come back negative based on the documentation on the arules pdf.

"mutualInformation", uncertainty, M (Tan et al., 2002)
Measures the information gain for Y provided by X.
Range: [0, 1](0 for independence)

assuming it could be just a bug (i.e. remove the minus sign and that should be it)... FYI

Sample below:

measures7 <- interestMeasure(rules_Arrest,
                        c("phi", "mutualInformation", "cosine",  "jaccard", "lift", "hyperLift"), #"leverage", "support", "gini", "hyperConfidence"),
                        transactions = DPPD_trans1_Arrest)
inspect(head(rules_Arrest))
lhs rhs support confidence lift
1 {ZipCode=75240} => {Division=North Central} 0.02027592 1.0000000 9.235521
2 {Sector=640} => {Division=North Central} 0.02100753 1.0000000 9.235521
3 {Sector=110} => {Division=Central} 0.02132107 1.0000000 8.181274
4 {ZipCode=75212} => {Sector=420} 0.02132107 0.9855072 40.997101
5 {ZipCode=75212} => {Division=SouthWest} 0.02142559 0.9903382 5.879960
6 {ZipCode=75208} => {Division=SouthWest} 0.02158236 0.9951807 5.908712
head(measures7, n = 20)
phi mutualInformation cosine jaccard lift hyperLift
1 0.41284206 -0.307263353 0.4327340 0.18725869 9.235521 6.807018
2 0.42038123 -0.310908803 0.4404718 0.19401544 9.235521 6.931034
3 0.39553519 -0.285008230 0.4176524 0.17443352 8.181274 6.181818
4 0.93344676 -0.847975096 0.9349343 0.87553648 40.997101 22.666667
5 0.32658365 -0.214306327 0.3549388 0.12705299 5.879960 4.659091
6 0.32891311 -0.216503142 0.3571049 0.12806202 5.908712 4.693182
7 0.47310567 -0.353491059 0.4906198 0.24783684 11.486745 8.354167
8 0.09550435 -0.012162221 0.1752533 0.03071371 1.409442 1.315457
9 0.09428369 -0.011812036 0.1750450 0.03078283 1.402731 1.310345
10 0.43094164 -0.316088614 0.4512959 0.20366795 9.235521 6.918033
11 0.94651261 -0.869063673 0.9477856 0.89938398 39.246165 23.052632
12 0.94651261 -0.884808556 0.9477856 0.89938398 39.246165 23.052632
13 0.86713506 -0.752373996 0.8704477 0.75767918 32.655290 19.304348
14 0.44229206 -0.321755514 0.4629100 0.21428571 9.235521 6.937500
15 0.07797480 -0.007918385 0.1698410 0.03065275 1.323733 1.237389
16 0.09873251 -0.012415261 0.1810415 0.03277602 1.409442 1.320475
17 0.41578962 -0.295491763 0.4383589 0.19215855 8.244722 6.371429
18 0.68171238 -0.548458088 0.6910043 0.47748691 20.037696 13.411765
19 0.60163715 -0.471097334 0.6141427 0.37717122 15.827957 11.121951
20 0.34716814 -0.225250203 0.3761424 0.14148309 5.937325 4.750000
19 0.60163715 -0.471097334 0.6141427 0.37717122 15.827957 11.121951
20 0.34716814 -0.225250203 0.3761424 0.14148309 5.937325 4.750000

Using base::write when called arules::write error

Maybe better error/warning output? See relevant post at SO:

https://stackoverflow.com/q/53185553/680068

I am guessing OP is passing a data.frame to arules::write, and arules is using base::write as input is not "transactions", maybe check input?

build from github fails

I attempted to install the dev version from github (to see if the %pin% issue in issue #16 was fixed in 1.4-2-1) I was unable to build from github. Build is on an up-to-date mac pro with R version 3.3.1 and RStudio 0.99.902.

> devtools::install_github("mhahsler/arules")
Downloading GitHub repo mhahsler/arules@master
from URL https://api.github.com/repos/mhahsler/arules/zipball/master
Error: Could not find build tools necessary to build arules

> devtools::install_git("git://github.com/mhahsler/arules")
Downloading git repo git://github.com/mhahsler/arules
Error: Could not find build tools necessary to build arules

> devtools::install_git("git://github.com/mhahsler/arules", args="--recursive")
Downloading git repo git://github.com/mhahsler/arules
Error: Could not find build tools necessary to build arules

AScontrol : memopt, varying number of rules

I get different number of rules when I turn it on/off. When memopt is TRUE, I usually get lesser rules. I am using minlen and maxlen, as well. Am I missing something?

The new feature maxTime has anything to do with it?

Error when using Rscript - should "arules" depend on "methods"?

After getting a strange error

    Error in validObject(.Object) : 
      invalid class “ngCMatrix” object: Not a valid 'Mnumeric' class object

when using random.transactions, I realised that the error is due to the fact that Rscript does not load methods by default.

Minimal Working Example:

    R -e 'library("arules"); random.transactions(5, 5)'

works like a charm, while Rscript thorws an error:

    Rscript -e 'library("arules"); random.transactions(5, 5)'
    ...
    Error in validObject(.Object) : 
      invalid class “ngCMatrix” object: Not a valid 'Mnumeric' class object

A temporary fix is to run Rscript as follows:

    Rscript --default-packages=methods -e 'library("arules"); random.transactions(5, 5)'

Another solution is to add library("methods") in the code before importing arules.

Is there a better solution? Can the arules package made robust against this issue?

arules: transactions-class

Hi, the following example is available for coercing a data.frame into transactions object when the TransactionId and Items are provided:
`## example 4: creating transactions from a data.frame with

transaction IDs and items

a_df3 <- data.frame(
TID = c(1,1,2,2,2,3),
item=c("a","b","a","b","c", "b")
)
a_df3
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")
trans4
inspect(trans4)
`
However, when I try to apply the same to a data.frame that has additional columns, I get error:
Error in asMethod(object) : can coerce list with atomic components only

My example is:
a_df3 <- data.frame(TID = c(1,1,2,2,2,3), item=c("a","b","a","b","c", "b") , column2 = c("Mon", "Wed", "Mon", "Tue", "Fri", "Mon"))
trans4 <- as(split(a_df3[,c("item", "column2")], a_df3[,"TID"]), ``"transactions")`
Error in asMethod(object) : can coerce list with atomic components only

Is there a method available to do the above with multiple columns (factor/logical)? In my application, if I first use "dcast" to create a WIDE table with unique TID values per row and then use only the data.frame without the TID column to convert it into a "transactions" object following your example 3 (a_df), then my data.frame size becomes too large (up to 5-7 GB). So I was hoping to create the "transactions" object required for "apriori" directly from my multiple-columns LONG table as shown above.

Thanks for any feedback!

Typo in Groceries data

There is a typo in one of the level 2 item labels in the Groceries data. I figured I should report it, but I almost hope you'll consider not fixing it - it gives me a little chuckle every time I see "meet and sausage".

Create a transaction class from a data frame with customer id and time stamp

Am sorry I am new to arules package and this is not actually an issue but rather a question. I already posted this question on stackoverflow but wanted to ask it here hoping to get an answer quick.

I have a data set with customer ID, event_date, and event_type looking like this:

cid  event_date  event_type
451  2017-01-05  VSLS
451  2017-01-08  VCRD
451  2017-02-04  COMM
451  2017-02-05  COMM
...
564  2017-01-05  VSVC
564  2017-01-06  COMM
564  2017-02-05  VCRD
...

and wanted to analyze frequent pattern of events. Q is how I build a transaction class that could potentially include customer id and time stamp in its @itemsetInfo?

Thnx

R version dependency

Is there a code change between v1.2.x and 1.4.x that requires R >= 3.2.0? Lots of users run older releases of R for some time between updating, so if there isn't a vital new feature or bugfix required in the newer R releases a lower version dependency is nice--not necessarily 2.14, but R >= 3.0 represents a hard break.

Unable to use '%pin%' on 'lhs'

After running:

      rules <- df %>%
               apriori(appearance = list(lhs = c(x), default="rhs"), 
                       parameter=list(support=0.0, confidence=0.25))

When running the following line:
rules <- subset(rules, subset = (lhs %pin% as.character(x)) & lift > 1)
I receive the error:
"unable to find an inherited method for function '%pin%' for signature '"standardGeneric", "character"'"
I believe the 'standardGeneric is the 'lhs' portion and the portion causing the error. x is just a column name.

This is with version 1.4-2. I don't believe I received the error on version 1.4-1.

Prediction method for apriori

Hi, is there a "predict" method for apriori similar to predict.rpart, etc? Currently I have my own code built following the suggestion at:
http://stats.stackexchange.com/questions/21340/finding-suitable-rules-for-new-data-using-arules
`basket <- Groceries[2]

find all rules, where the lhs is a subset of the current basket

rulesMatchLHS <- is.subset(rules@lhs,basket)

and the rhs is NOT a subset of the current basket (so that some items are left as potential recommendation)

suitableRules <- rulesMatchLHS & !(is.subset(rules@rhs,basket))

here they are

inspect(rules[suitableRules])

now extract the matching rhs ...

recommendations <- strsplit(LIST(rules[suitableRules]@rhs)[[1]],split=" ")
recommendations <- lapply(recommendations,function(x){paste(x,collapse=" ")})
recommendations <- as.character(recommendations)

... and remove all items which are already in the basket

recommendations <- recommendations[!sapply(recommendations,function(x){basket %in% x})]

print(recommendations)`

but that takes enormously long (several hours) to process a testdata that is 20,000 rows with an apriori model having about 300,000 rules. Wanted to check if there exists any method already that processing a table of testdata much faster (ideally, a few seconds), or if there is plan of developing such a method in the near future?

Thanks!
Supriya

very few scores from interestsMeasure

I can see there is a long list of metrics/scores to calculate additional interest measures however, by default (by not setting the "measure" argument in the interestMeasure function, only 3 or 4 are calculated).

Basically, I am interested in calculating "certainty" which is on the list, but the function returns an error

"Error in interestMeasure(r, measure = c("certainty"), trans = subscription.trans) :
Value 'certainty' is an invalid measure for itemsets."

This is regardless of using "apriori" or "eclat".

Thanks,

Discretize using frequency

Hi everyone,

I'm trying to use the discretize function. Given the following vector:

nums <- c(rep(1,7), 
          rep(2,3),
          rep(3,4),
          rep(4,5),
          rep(5,9),
          rep(6,10),
          rep(7,8),
          rep(8,1),
          rep(9,9),
          rep(10,4))
> nums
[1]  1  1  1  1  1  1  1  2  2  2  3  3  3  3  4  4  4  4  4  5  5  5  5  5  5  5  5
[28]  5  6  6  6  6  6  6  6  6  6  6  7  7  7  7  7  7  7  7  8  9  9  9  9  9  9  9
[55]  9  9 10 10 10 10

When I use the code table(discretize(nums, "frequency", categories=6)) I get the output below

[ 1, 3) [ 3, 6)       6       7 [ 8,10)      10 
     10      18      10       8      10       4

Nevertheless, I was expecting 6 categories of frequency 10. Am I misunderstanding something?

Thank you in advance,
Noelia

Association rules in R (arules) - how to use the dissimilarity function in arules R

I am trying to do association rules clustering in R and I am having a problem when I was trying to use the function "dissimilarity". I tried something like this:

dis<-dissimilarity(rules, method = "gupta", args = "trans")

I want to use the "gupta" method to calculate the dissimilarity/distances of the rules, the R help manual said to use "gupta": "The transactions used to mine the associations has to be passed on via args as element "transactions"."

"trans" is my transactions (sparse format) and "rules" is my association rules. An error message said "Error in args$$trans : $ operator is invalid for atomic vectors"

why is it doesn't work?

I have also tried:

dis<-dissimilarity(rules, method = "gupta", args = list("trans"))

Error: Error in .local(x, y, method, args, ...) : Transactions needed in args for this method!

Is there anything wrong with my syntax?

HELP PLEASE !!!!!!!!!!

Executing head() or tail() on empty rule set returns error

Executing head(x) or tail(x) on a rules object having no rules results in error message "Error in slot(x, s)[i] : subscript out of bounds".

Error does not occur with arules::sort().

Using arules 1.5-0 on R 3.3.3

Rules with multiple items in the RHS

Add code to produce rules with more than 1 items in the RHS in function ruleInduction().

mhahsler / arules Goto Github PK

arules's Issues

object.size(arulesModelSubset) #49910224 bytes # 50 MB compared to 17 MB above

Download dataset from

Save in local directory of R

12 rules

2 rules

title: "Grocery_Test" output: flexdashboard::flex_dashboard: storyboard: true social: menu source_code: embed runtime: shiny

Inputs {.sidebar}

transaction IDs and items

find all rules, where the lhs is a subset of the current basket

and the rhs is NOT a subset of the current basket (so that some items are left as potential recommendation)

here they are

now extract the matching rhs ...

... and remove all items which are already in the basket

Recommend Projects

Recommend Topics

Recommend Org

title: "Grocery_Test"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source_code: embed
runtime: shiny