Difference beetween fpgrowth and fpmax not documented #1030

emilianomm · 2023-04-24T15:19:33Z

Describe the documentation issue

Hi. I´m using the library to find association rules in a dataset. In order to do that, I´m passing the output of the three algorithms to the association_rules() function. The documentation says these are equivalent in terms of parameters and output, but I´m getting on the following error only with the output from fpmax() :

KeyError: 'frozenset({120})You are likely getting this error because the DataFrame is missing  antecedent and/or consequent  information. You can try using the  `support_only=True` option'

A minimal code example of my implementation would be like

from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import fpmax

### Assume baskets_matrix is an ad_hoc pandas df.

### This works OK
freq_items_1 = fpgrowth(baskets_matrix, min_support=0.1)
freq_items_2 = fpmax(baskets_matrix, min_support=0.1)

### This also works OK
AR_1 =association_rules(freq_items_1, metric="confidence", min_threshold=0.5)

### This raises the error
AR_2 =association_rules(freq_items_2, metric="confidence", min_threshold=0.5)

Since all other factors are the same, I have to assume that there is a difference in the output of fpgrowth and fpmax which is not clearly documented.

I also noticed that the documentation refers to the association_rules() function as generate_rules() which leads to further confussion.

Suggest a potential improvement or addition

I would like to ask if it´s possible to clarify if the output from the different algoriths are indeed different or there is another issue here.

Also, I think it will be useful for anyone using the library to have this remarks added on the documentatinon.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

Jordenjj · 2023-05-02T09:25:21Z

As per the documentation "FP-Max is a variant of FP-Growth, which focuses on obtaining maximal itemsets. An itemset X is said to maximal if X is frequent and there exists no frequent super-pattern containing X. In other words, a frequent pattern X cannot be sub-pattern of larger frequent pattern to qualify for the definition maximal itemset."
That being said, I am getting the error too when using FP-Max.

josejub · 2023-05-12T09:28:33Z

Same here, when mining frequent itemsets with fp-growth it works fine, but when using fp-max I get the same error. a example of my code is:

Assume negated is a one-hot encoded dataframe

max = fpmax(negated, min_support=0.3, use_colnames=True, max_len=5)
max
rules = association_rules(max,metric="confidence", min_threshold=0.85) # Error appears here

Works well

max = fpgrowth(negated, min_support=0.3, use_colnames=True, max_len=5)
max
rules = association_rules(max,metric="confidence", min_threshold=0.85)

emilianomm added the Documentation label Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference beetween fpgrowth and fpmax not documented #1030

Difference beetween fpgrowth and fpmax not documented #1030

emilianomm commented Apr 24, 2023

Jordenjj commented May 2, 2023 •

edited

Loading

josejub commented May 12, 2023

Difference beetween fpgrowth and fpmax not documented #1030

Difference beetween fpgrowth and fpmax not documented #1030

Comments

emilianomm commented Apr 24, 2023

Describe the documentation issue

Suggest a potential improvement or addition

Jordenjj commented May 2, 2023 • edited Loading

josejub commented May 12, 2023

Assume negated is a one-hot encoded dataframe

Works well

Jordenjj commented May 2, 2023 •

edited

Loading