Thursday, June 23, 2016
BIML Series, Part 1, BIML Development approach
This summer I will post quite a bit about BIML. In this post I will share how one can develop BIML packages without purchasing additional software.
I installed Microsoft Visual C# 2008 express edition, because it hardly occupies any disk space and is a free download.
Within Visual C# is run this code:
using System; using System.Data; using System.Data.SqlClient; using System.Collections.Generic; using System.ComponentModel; using System.Text; using System.Diagnostics; using System.Windows.Forms; public class ExtractTable { public string PackageName { get; set; } public string SequenceName { get; set; } public string DataFlowName { get; set; } public string SchemaName { get; set; } public string SourceSystemName { get; set; } public string SelectQuery { get; set; } public string SQLFlavour { get; set; } public string ExtractTableName { get; set; } public string SourceTableName { get; set; } public string ArchiveProcedure { get; set; } private static string metadataConnectionString = "Server=localhost;Database=meta;User Id=lalal; Password=dada;"; private static DataTable getData(string query) { DataTable dataTable = new DataTable(); using (SqlDataAdapter dataAdapter = new SqlDataAdapter(query, metadataConnectionString)) { dataAdapter.Fill(dataTable); } return dataTable; } private static DataTable getData(string query, string parameter, string parameterValue) { DataTable dataTable = new DataTable(); using (SqlConnection conn = new SqlConnection(metadataConnectionString)) { using (SqlCommand cmd = new SqlCommand(query, conn)) { cmd.Parameters.AddWithValue(parameter, parameterValue); SqlDataAdapter adapter = new SqlDataAdapter(cmd); adapter.Fill(dataTable); } } return dataTable; } private static ExtractTable fillExtractTable ( string PackageName, string SequenceName, string DataFlowName, string SchemaName, string SourceSystemName, string SelectQuery, string SQLFlavour, string ExtractTableName, string SourceTableName, string ArchiveProcedure ) { ExtractTable extractTable = new ExtractTable(); extractTable.PackageName = PackageName; extractTable.SequenceName = SequenceName; extractTable.DataFlowName = DataFlowName; extractTable.SchemaName = SchemaName; extractTable.SourceSystemName = SourceSystemName; extractTable.SelectQuery = SelectQuery; extractTable.SQLFlavour = SQLFlavour; extractTable.ExtractTableName = ExtractTableName; extractTable.SourceTableName = SourceTableName; extractTable.ArchiveProcedure = ArchiveProcedure; return extractTable; } [STAThread] static void Main() { // Package objects var extractTables = new ListThe most important thing to note about this code is the fact a predefined piece of biml code is loaded from a text file and the variables are replaced by the values coming from the meta data repository in SQL. The resulting non dynamic biml code can then be pasted in VS2008 or later with BIDS helper to generate the package. Any error messages will be a lot easier to debug. Will be continued..(); // Variables used to retrieve and write data string query; DataTable tblExtractTables; // Add ExtractTables query = "SELECT top 1 " + " 'Extract_Entrino_' + bet.SourceTableName AS PackageName," + " bet.SequenceName," + " bet.DataFlowName," + " bet.SchemaName," + " bet.SourceSystemName," + " bet.SelectQuery," + " bet.SQLFlavour," + " bet.ExtractTableName," + " bet.SourceTableName," + " bet.ArchiveProcedure" + " FROM META.BimlExtractArchivePackages AS bet "; tblExtractTables = getData(query); foreach (DataRow row in tblExtractTables.Rows) { extractTables.Add(fillExtractTable(row[0].ToString(), row[1].ToString(), row[2].ToString(), row[3].ToString(), row[4].ToString(), row[5].ToString(), row[6].ToString(), row[7].ToString(), row[8].ToString(), row[9].ToString())); } foreach (ExtractTable extractTable in extractTables) { string text = System.IO.File.ReadAllText(@"C:\TFS\Temp\biml.txt"); text = text.Replace("<#=extractTable.PackageName#>", extractTable.PackageName); text = text.Replace("<#= extractTable.SelectQuery#>", extractTable.SelectQuery); Clipboard.SetText(text); } } }
Saturday, June 11, 2016
First experience with Neo4J
During the DWH and BI summit I got inspired to look into Graph Databases. A former colleague recommended looking into Neo4J. It turned out to be quite easy to load data in Neo4J en more easily explore related data. In our case we wanted to analyse our Meta Data Driven DWH Framework. Which had grown quite a bit and become more and more difficult to maintain. We gave the 'Force directed Graph' visualization in Power BI a try, but this was to limited for our analysis goals.
The video below shows the results.
More in Neo4J in the future!
The video below shows the results.
More in Neo4J in the future!
Remote controlled car with Raspberry Pi (piborg) and HC-SR04 ultrasonic sensor (to detect walls and stop car)
I this post I will show how to build a remote controlled car with your Raspberry Pi that will stop whenever it encounters a wall at a predefined distance. The result will look like this:
The car looks like this in detail:
Smart buggy 16 euro.
Two additional wheels, that I glued to the plastic board.
PIBORG - PICOBORG V2 - MOTOR CONTROLLER, 13 euro.
Triborg, 9 euro.
The ultra sonic distance sensor, HCSR04, 2,50 euro.
Wiring was done as described on the internet. The car is remote controlled via telnet using Putty.
The car looks like this in detail:
Components
Smart buggy 16 euro.
Two additional wheels, that I glued to the plastic board.
PIBORG - PICOBORG V2 - MOTOR CONTROLLER, 13 euro.
Triborg, 9 euro.
The ultra sonic distance sensor, HCSR04, 2,50 euro.
Wiring was done as described on the internet. The car is remote controlled via telnet using Putty.
Code
The code below is executed. This was tricky because the key listener needs to be non blocking. Once I found one that is non blocking. Its just a matter of using keys, I, J, K, L for steering the car. The distance sensor checks for distances less than 30 CM. The motor is turned off when this happens.import sys import select import tty import termios import contextlib import time import RPi.GPIO as GPIO GPIO.setmode(GPIO.BCM) #used by the distance meter TRIG = 23 ECHO = 24 GPIO.setup(TRIG,GPIO.OUT) GPIO.setup(ECHO,GPIO.IN) #used by the piborg # Set which GPIO pins the drive outputs are connected to DRIVE_1 = 4 DRIVE_2 = 18 DRIVE_3 = 8 DRIVE_4 = 7 # Set all of the drive pins as output pins GPIO.setup(DRIVE_1, GPIO.OUT) GPIO.setup(DRIVE_2, GPIO.OUT) GPIO.setup(DRIVE_3, GPIO.OUT) GPIO.setup(DRIVE_4, GPIO.OUT) # Map current on/off state to command state dInvert = {} dInvert[True] = GPIO.LOW dInvert[False] = GPIO.HIGH # Map the on/off state to nicer names for display dName = {} dName[True] = 'ON ' dName[False] = 'OFF' # Function to set all drives off def MotorOff(): GPIO.output(DRIVE_1, GPIO.LOW) GPIO.output(DRIVE_2, GPIO.LOW) GPIO.output(DRIVE_3, GPIO.LOW) GPIO.output(DRIVE_4, GPIO.LOW) def MotorOn(): GPIO.output(DRIVE_1, GPIO.HIGH) GPIO.output(DRIVE_2, GPIO.HIGH) GPIO.output(DRIVE_3, GPIO.HIGH) GPIO.output(DRIVE_4, GPIO.HIGH) def afstand(): time.sleep(0.2) GPIO.output(TRIG, False) GPIO.output(TRIG, True) time.sleep(0.00001) GPIO.output(TRIG, False) while GPIO.input(ECHO)==0: pulse_start = time.time() while GPIO.input(ECHO)==1: pulse_end = time.time() pulse_duration = pulse_end - pulse_start distance = pulse_duration * 17150 distance = round(distance, 2) return distance #distance def isData(): return select.select([sys.stdin], [], [], 0) == ([sys.stdin], [], []) old_settings = termios.tcgetattr(sys.stdin) try: tty.setcbreak(sys.stdin.fileno()) while 1: if (afstand() < 50): MotorOff() if isData(): c = sys.stdin.read(1) print str((c)) if c == '\x1b': # x1b is ESC break elif c == 'i': #Vooruit MotorOn() elif c == 'k': #Stop MotorOff() elif c == 'j': #Links print 'links' if ((GPIO.input(DRIVE_1)) & (GPIO.input(DRIVE_4))): print 'aan het rijden' GPIO.output(DRIVE_1, GPIO.LOW) GPIO.output(DRIVE_2, GPIO.LOW) time.sleep(0.1) GPIO.output(DRIVE_1, GPIO.HIGH) GPIO.output(DRIVE_2, GPIO.HIGH) else: GPIO.output(DRIVE_3, dInvert[GPIO.input(DRIVE_3)]) GPIO.output(DRIVE_4, dInvert[GPIO.input(DRIVE_4)]) elif c == 'l': #Rechts print 'rechts' if ((GPIO.input(DRIVE_1)) & (GPIO.input(DRIVE_4))): print 'aan het rijden' GPIO.output(DRIVE_4, GPIO.LOW) GPIO.output(DRIVE_3, GPIO.LOW) time.sleep(0.1) GPIO.output(DRIVE_4, GPIO.HIGH) GPIO.output(DRIVE_3, GPIO.HIGH) else: GPIO.output(DRIVE_1, dInvert[GPIO.input(DRIVE_1)]) GPIO.output(DRIVE_2, dInvert[GPIO.input(DRIVE_2)]) finally: print("stop") termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_settings)
Thursday, June 9, 2016
Market Basket Analysis (Association Rule Learning) with Power BI (DAX) and R
Introduction
In this post I will show how to run an R script from Power BI which will execute an Association rule learning script to perform market basket analysis.In this example we will not look at products sold, but products sharing shelf space.
The dataset
Our basic dataset looks like this.Our products:
The distribution / presence of products on the shelf of a customer:
The Power BI building blocks
The data model
As for the DAX part we will start with this post of Marco Russo and Alberto Ferrari.So the data model in Power BI looks like this:
The R visualization
We will look at the DAX part later on. First we add an R component with a script that will return the AR rules it found.The table contains the basic output that is to be expected from AR. We will try to build these measures in DAX later on.
The R script
As for the R script it looks like this:save(dataset, file="C:/TFS/dataset.rda") library(arules, lib.loc="C:/TFS/Rlib/a/" , logical.return = FALSE, warn.conflicts = F, quietly = T,verbose = F) library(plotrix, lib.loc="C:/TFS/Rlib/p/" , logical.return = FALSE, warn.conflicts = F, quietly = T,verbose = F) dataset = cbind(dataset, 1) colnames(dataset) = c("ProductID", "CustomerID", "Waarde") reports = xtabs(Waarde~CustomerID+ProductID, data=dataset) reports[is.na(reports)] <- 0 rules <- apriori(as.matrix(as.data.frame.matrix(reports)),parameter = list(supp = 0.03, conf = 0.5, target = "rules")) t = inspect(head(sort(rules, by ="support"),15)) par(mar = c(0,0,0,0)) plot(c(0, 0), c(0, 0)) if (is.null(t)) { t = data.frame("no rules found") text(x = 0.5, y = 0.5, paste("No Rules found"), cex = 1.6, col = "black") } else { addtable2plot(-1, -1, t, bty = "n", display.rownames = F, hlines = F, vlines = F) }Unfortunately Power BI initializes a new R sessions each time the R visualization is run / cross filtered. Therefore I tried to use a much base R as possible. As for the libraries that need to be loaded. I put these in a separate folder on my local drive and specified the folder name in the library command.
Building it in DAX
Support
The output of the arules R script can be built in DAX whenever it concerns single item combinations, so X -> Y. So not A, B -> Y. The 'support' measure is basically the '[Orders with Both Products %]' described by Russo and Ferrari. Just to show how its implemented on our dataset.Customers with Both Products % = IF ( NOT ( [SameProductSelection] ); DIVIDE ( [Customers with Both Products]; [Unique Customers All] ) )The building blocks of this formula:
Same product selection, since this is useless.
SameProductSelection = IF ( HASONEVALUE ( Products[ID] ) && HASONEVALUE ( 'Filter Products'[ID] ); IF ( VALUES ( Products[ID] ) = VALUES ( 'Filter Products'[ID] ); TRUE ) ) |
|
Customers with both products:
Customers with Both Products = CALCULATE ( DISTINCTCOUNT ( Distribution[Customer ID] ); CALCULATETABLE ( SUMMARIZE ( Distribution; Distribution[Customer ID] ); ALL ( Products ); USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] ) ) ) |
Number of customers in total:
Unique Customers All = CALCULATE ( DISTINCTCOUNT ( Distribution[Customer ID] ); ALL ( Products ) ) |
Confidence
Confidence = [Customers with Both Products] / [Unique Customers LHS]Unique Customers LHS:
Unique Customers LHS = DISTINCTCOUNT(Distribution[Customer ID])
Lift
Lift = [Confidence] / [Proportion Product RHS]
Proportion product RHS:
Proportion Product RHS = Distribution[Unique Customers RHS] / [Unique Customers All]
Unique customer RHS:
Unique Customers RHS = CALCULATE ( DISTINCTCOUNT ( Distribution[Customer ID] ); CALCULATETABLE ( SUMMARIZE ( Distribution; Distribution[Customer ID] ); ALL ( Products ); USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] ) ); ALL(Products) )
You can download the Power BI file here.
In this video you see the Power BI file in use:
Subscribe to:
Posts (Atom)