Friday, May 27, 2016

Correlation in DAX

In this blogpost I show how to calculate the correlation in DAX. This post will be refined in the future, also to show the comparison with R. The code is shown so you see how to run it in DAX studio. We will investigate the correlation between visits and sales. We define the standard deviation for visits:
    
DEFINE
    MEASURE Visits[sdtotal_visits] =
        CALCULATE (
            STDEVX.P ( VALUES ( 'Visit Date'[Month] ), [Completed Visits] ),
            ALLSELECTED ( 'Visit Date' )
        )

We define the mean for visits:
    
    MEASURE Visits[meantotal_visits] =
        CALCULATE (
            AVERAGEX ( VALUES ( 'Visit Date'[Month] ), [Completed Visits] ),
            ALLSELECTED ( 'Visit Date' )
        )
Same for value
    
    MEASURE Visits[sdtotal_ov] =
        CALCULATE (
            STDEVX.P ( VALUES ( 'Visit Date'[Month] ), [Order Value] ),
            ALLSELECTED ( 'Visit Date' )
        )
    MEASURE Visits[meantotal_ov] =
        CALCULATE (
            AVERAGEX ( VALUES ( 'Visit Date'[Month] ), [Order Value] ),
            ALLSELECTED ( 'Visit Date' )
        )
We multiply the two standard deviations:
    
MEASURE Visits[sdsd] =
        CALCULATE ( [sdtotal_visits] * [sdtotal_ov] )
We calculate the deviation from the mean for each measure:
    
    MEASURE Visits[afwijking_visits] =
        CALCULATE ( ( [Completed Visits] - [meantotal_visits] ) )
    MEASURE Visits[afwijking_value] =
        CALCULATE ( ( [Order Value] - [meantotal_ov] ) )
We calculate the average of the product of the two deviations:
    
MEASURE Visits[avgproduct] =
        CALCULATE (
            AVERAGEX (
                VALUES ( 'Visit Date'[Month] ),
                [afwijking_visits] * [afwijking_value]
            ),
            ALLSELECTED ( 'Visit Date' )
        )
We devide this value of the product of the two SD's
    
    MEASURE Visits[correlation2] =
        CALCULATE ( [avgproduct] / [sdsd] )
Now the query to view the results:
    
EVALUATE
CALCULATETABLE (
    ADDCOLUMNS (
        SUMMARIZE ( 'Visit Date', 'Visit Date'[Month] ),
        "Visits", [Completed Visits],
        "order value", [Order Value],
        "correlation2", [correlation2]
    ),
    'Visit Date'[Month Key] = "2015M12"
        || 'Visit Date'[Month Key] = "2016M01"
        || 'Visit Date'[Month Key] = "2016M02"
)

1 comment:

Lisa Jones said...

We urge Users to as often as possible check this page for almost any progressions to stay educated about how we are securing the individual data we gather.
Business Valuation