Thursday, June 9, 2016

Market Basket Analysis (Association Rule Learning) with Power BI (DAX) and R

Introduction

In this post I will show how to run an R script from Power BI which will execute an Association rule learning script to perform market basket analysis.

In this example we will not look at products sold, but products sharing shelf space.

The dataset

Our basic dataset looks like this.

 Our products:

The distribution / presence of products on the shelf of a customer:

The Power BI building blocks

The data model

As for the DAX part we will start with this post of Marco Russo and Alberto Ferrari.

So the data model in Power BI looks like this:



The R visualization

We will look at the DAX part later on. First we add an R component with a script that will return the AR rules it found.


The table contains the basic output that is to be expected from AR. We will try to build these measures in DAX later on.

The R script

As for the R script it looks like this:


   
save(dataset, file="C:/TFS/dataset.rda")

library(arules, lib.loc="C:/TFS/Rlib/a/" , logical.return = FALSE,
        warn.conflicts = F, quietly = T,verbose = F)
library(plotrix, lib.loc="C:/TFS/Rlib/p/" , logical.return = FALSE,
        warn.conflicts = F, quietly = T,verbose = F)

dataset = cbind(dataset, 1)
colnames(dataset) = c("ProductID", "CustomerID", "Waarde")
reports = xtabs(Waarde~CustomerID+ProductID, data=dataset)
reports[is.na(reports)] <- 0
rules <- apriori(as.matrix(as.data.frame.matrix(reports)),parameter = list(supp = 0.03, conf = 0.5, target = "rules"))
t = inspect(head(sort(rules, by ="support"),15))

par(mar = c(0,0,0,0))
plot(c(0, 0), c(0, 0))
if (is.null(t)) {
  t = data.frame("no rules found")
  text(x = 0.5, y = 0.5, paste("No Rules found"), 
       cex = 1.6, col = "black")
} else {
  addtable2plot(-1, -1, t, bty = "n", display.rownames = F, hlines = F,
                vlines = F)
}

Unfortunately Power BI initializes a new R sessions each time the R visualization is run / cross filtered.  Therefore I tried to use a much base R as possible. As for the libraries that need to be loaded. I put these in a separate folder on my local drive and specified the folder name in the library command.

Building it in DAX

Support

The output of the arules R script can be built in DAX whenever it concerns single item combinations, so X -> Y. So not A, B -> Y.  The 'support' measure is basically the '[Orders with Both Products %]' described by Russo and Ferrari. Just to show how its implemented on our dataset.
   
Customers with Both Products % = 
IF (
    NOT ( [SameProductSelection] );
    DIVIDE ( [Customers with Both Products]; [Unique Customers All] )
)
The building blocks of this formula:
Same product selection, since this is useless.
   
SameProductSelection = 
IF (
    HASONEVALUE ( Products[ID] )
        && HASONEVALUE ( 'Filter Products'[ID] );
    IF (
        VALUES ( Products[ID] )
            = VALUES ( 'Filter Products'[ID] );
        TRUE
    )
)
Customers with both products:
   
Customers with Both Products = 
CALCULATE (
    DISTINCTCOUNT ( Distribution[Customer ID] );
    CALCULATETABLE (
        SUMMARIZE ( Distribution; Distribution[Customer ID] );
        ALL ( Products );
        USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] )
    )
)
Number of customers in total:
Unique Customers All = 
CALCULATE (
    DISTINCTCOUNT ( Distribution[Customer ID] );   
        ALL ( Products )
    )

Confidence

   
Confidence = [Customers with Both Products] / [Unique Customers LHS]
Unique Customers LHS:
   
Unique Customers LHS = DISTINCTCOUNT(Distribution[Customer ID])

Lift


Lift = [Confidence] / [Proportion Product RHS]

Proportion product RHS:
Proportion Product RHS = Distribution[Unique Customers RHS] / [Unique Customers All]

Unique customer RHS:
 
Unique Customers RHS = 
CALCULATE (
    DISTINCTCOUNT ( Distribution[Customer ID] );
    CALCULATETABLE (
        SUMMARIZE ( Distribution; Distribution[Customer ID] );
        ALL ( Products );
        USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] )
    ); ALL(Products)
)

You can download the Power BI file here.

In this video you see the Power BI file in use:

No comments: