ExcelR is a leading Data Science Course in pune training institute.Data Science Course in pune will be delivered by highly experienced and certified trainers who are considered as one the best trainers in the industry and so we are considered to be one of the best Data Science Course in pune training institutes.
data science course
Association Rules
Market
Basket
Analysis
Relationship Affinity
Mining
Analysis
© 2013 ExcelR Solutions. All Rights
Reserved
Market Basket
Analysis
• Large number of transaction records through data collected using bar-code scanners
• Each record = All items purchased on a single purchase transaction
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules
• What item goes with what
• Are certain groups of items consistently purchased together
• What business strategies will you device with this knowledge
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules
• Products shelf placement – a specific product beside another
• Selling of prominent shelves – Slotting Fees
• Stocking – Supply Chain Management
• Price Bundling – Combo offers. How?
Source: http://www.economist.com/news/business/21654601-supplier-rebates-are-heart-some-supermarket-chains-woes-buying-up-shelves
https://en.wikipedia.org/wiki/Association_rule_learning
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Cell phone faceplates
A store sells accessories for cellular
phones runs a promotion on
faceplates
OFFER!
Buy multiple faceplates from a choice
of
6 different colors & get discount
How would you help store managers
device strategy to become more
profitable
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Cell phone faceplates
List Format Binary Matrix Format
Transaction # Faceplate colors purchased Transaction # Red White Blue Orange Green Yellow
1 Red White Green 1 1 1 0 0 1 0
2 White Orange 2 0 1 0 1 0 0
3 White Blue 3 0 1 1 0 0 0
4 Red White Orange 4 1 1 0 1 0 0
5 Red Blue 5 1 0 1 0 0 0
6 White Blue 6 0 1 1 0 0 0
7 White Orange 7 0 1 0 1 0 0
8 Red White Blue Green 8 1 1 1 0 1 0
9 Red White Blue 9 1 1 1 0 0 0
10 Yellow 10 0 0 0 0 0 1
Association Rules are probabilistic “if-then” statements
2 Main Ideas:
Examine all possible “if-then” rule formats
Select rules, which indicates true dependence
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Cell phone faceplates
Rules for { Red, White, Green}
1. If {Red, White} then {Green}
Problem
2. If {Red, Green} then {White} • Many rules are possible
3. If {White, Green} then {Red} • How to select the
TRUE/GOOD rules from
4. If {Red} then {White, Green} all generated rules?
5. If {White} then {Red, Green}
6. If {Green} then {Red, White}
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Terminology
“IF” part = Antecedent = A
“THEN” part = Consequent = C
• If {Red, White} then {Green}
• If Red & White phone faceplates are purchased, then Green
faceplate is purchased
Antecedent: Red & White
Consequent: Green
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Performance Measures
1 2 3
Support Confiden Lift
ce
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Support
• Consider only combinations that occur with
higher frequency in the database
• Support is the criterion based on frequency
1 Percentage / Number of transactions in which
Support IF/Antecedent & THEN / Consequent appear in
the data
Mathematically:
# transactions in which A & C appear together
Total no. of transactions
© 2013 ExcelR Solutions. All Rights
Reserved
Support - Calculation
Transaction # Faceplate colors purchased
1 Red White Green
2 White Orange
3 White Blue
4 Red White Orange
5 Red Blue
6 White Blue
7 White Orange
8 Red White Blue Green
9 Red White Blue
10 Yellow
• What is the support for • What is the support for
“if White then Blue”? “if Blue then White”?
1. 4 1. 4
2. 40% 2. 40%
3. 2 3. 2
4. 90% 4. 90%
© 2013 ExcelR Solutions. All Rights
Reserved
Support - Problem
• Generating all possible rules is exponential in the
number of distinct items
• Solution:
Frequent item sets using Apriori Algorithm
© 2013 ExcelR Solutions. All Rights
Reserved
Apriori Algorithm For k products:
1 Set minimum support criteria
Generate list of one-item sets that meet the support
2 criterion
Use list of one-item sets to generate list of two-item sets
3 that meet support criterion
Use list of two-item sets to generate list of three-item sets
4 that meet support criterion
5 Continue up through k-item sets
© 2013 ExcelR Solutions. All Rights
Reserved
Support – Criterion = 2
Transaction # Faceplate colors purchased Item set Support (Count)
1 Red White Green {Red} 5
2 White Orange {White} 8
3 White Blue
{Blue} 5
4 Red White Orange
5 Red Blue {Orange} 3
6 White Blue {Green} 2
7 White Orange {Red, White} 4
8 Red White Blue Green {Red, Blue} 3
9 Red White Blue
{Red, Green} 2
10 Yellow
{White, Blue} 4
{White, Orange} 3
Create rules from {White, Green}
2
{Red, White, Blue} 2
frequent item sets only {Red, White, Green} 2
© 2013 ExcelR Solutions. All Rights
Reserved
Support Criterion Example
Rules for { Red, White, Green}
1. If {Red, White} then {Green}
2. If {Red, Green} then {White} How good are these
rules beyond the point
3. If {White, Green} then {Red} that they have high
support?
4. If {Red} then {White, Green}
5. If {White} then {Red, Green}
6. If {Green} then {Red, White}
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Confidence
• Percentage of If/Antecedent transactions that
also have the Then/Consequent item set
Mathematically:
2 P (Consequent | Antecedent) = P(C & A) / P(A)
Confidence # transactions in which A & C appear together
# transactions with A
© 2013 ExcelR Solutions. All Rights
Reserved
Confidence - Calculation
Transaction # Faceplate colors purchased
1 Red White Green
2 White Orange
3 White Blue
4 Red White Orange
5 Red Blue
6 White Blue
7 White Orange
8 Red White Blue Green
9 Red White Blue
10 Yellow
• What is the confidence • What is the confidence
for “if White then Blue”? for “if Blue then White”?
1. 1.
4/5 4/5
2. 5/8 2.
3. 5/4 5/8
4. 4/8 © 2013 ExcelR Solutions. All Ri3gh.ts
Reserved 5/4
4.
4/8
Confidence - Weakness
• If antecedent and consequent have:
High Support => High / Biased Confidence
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules – Lift Ratio
Confidence / Benchmark
confidence
Benchmark assumes independence between
antecedent & consequent:
3 Benchmark confidence
Lift Ratio P(C|A) = P(C & A) / P(A) = P(C) X P(A) /P(A) = P(C)
# transactions with consequent item sets
# transactions in database
© 2013 ExcelR Solutions. All Rights
Reserved
Interpreting Lift
• Lift > 1 indicates a rule that is useful in finding consequent item
sets
• The rule above is much better than selecting random
transactions
© 2013 ExcelR Solutions. All Rights
Reserved
Lift - Calculation
Transaction # Faceplate colors purchased
1 Red White Green
2 White Orange
3 White Blue
4 Red White Orange
5 Red Blue
6 White Blue
7 White Orange
8 Red White Blue Green
9 Red White Blue
10 Yellow
• What is the Lift for “if White then Blue”?
1. 4/8
2. 5/10
3. 4/5
4. 1
© 2013 ExcelR Solutions. All Rights
Reserved
Rules selection process
Generate all rules that meet
specified Support & Confidence
Find frequent item sets based on
Support specified by applying
minimum support cutoff
From these item sets, generate rules
with defined Confidence. By filtering
remaining rules select only those with
high Confidence
© 2013 ExcelR Solutions. All Rights
Reserved
Rules
Inputs Data
# Transactions in Input Data 10
# Columns in Input Data 6
# Items in Input Data 6
# Association Rules 8
Minimum Support 2
Minimum Confidence 70.00%
List of Rule: If all Antecedent items are purchased, then with Confidence percentage Consequent items will also be
Rules purchased.
Antecedent Consequent Support for Support for Support for Lift
Row ID Confidence % (A) (C) A C A & C Ratio
8 100 green red & white 2 4 2 2.5
4 100 green red 2 5 2 2
6 100 white & green red 2 5 2 2
3 100 orange white 3 8 3 1.25
5 100 green white 2 8 2 1.25
7 100 red & green white 2 8 2 1.25
1 80 red white 5 8 4 1
2 80 blue white 5 8 4 1
© 2013 ExcelR Solutions. All Rights
Reserved
Alarming!
Random data can generate apparently
interesting association rules
More the rules you produce, greater the
danger
Rules based on large numbers of records
are less subject to this danger
© 2013 ExcelR Solutions. All Rights
Reserved
Profusion of rules
© 2013 ExcelR Solutions. All Rights
Reserved
Applications
• What if Product & Stores are selected as a tuple for analysis?
• What if crimes in different Narcotics
geographies for each week
is known?
Public
Battery Assault Narcotics Peace
Violation
Robbery
© 2013 ExcelR Solutions. All Rights
Reserved
Recap with an example
• How can you use the information if you know about the
purchase history of customers in a specific geography?
• Supermarket database has 100,000 POS transactions
• 2000 transactions include both Strepsils & Orange Juice
• 800 of the above 2000 include Soup purchases
© 2013 ExcelR Solutions. All Rights
Reserved
Recap with an example
• What is the support for rule “IF (Orange Juice & Strepsils) are purchased
THEN (Soup) is purchased on the same trip”?
1. 0.8 %
2. 2 %
3. 40 %
• What is the confidence for rule “IF (Orange Juice & Strepsils) are purchased
THEN (Soup) is purchased on the same trip”?
1. 0.8 %
2. 2 %
3. 40 %
© 2013 ExcelR Solutions. All Rights
Reserved
Recap with an example
• What is the lift ratio for rule “IF (Orange Juice & Strepsils) are purchased
THEN (Soup) is purchased on the same trip”?
© 2013 ExcelR Solutions. All Rights
Reserved
Sequential Pattern Mining
IT IS
• If person X has taken “Data
Mining Unsupervised” training
in 1st Quarter, Person X has
also taken “Data
Supervised” Mining 2nd
Quarter training in
• Based on the
NOT statementMabinoivneg, Supreecrovimsemde” ntrdainin“Dg atota
those who have enrolled for
Purchases / events occur at “Data Mining Unsupervised”
the same time
© 2013 ExcelR Solutions. All Rights
Reserved
Association Rules vs. Sequential Pattern
Mi•niLnogok for temporal patterns
• Order/sequence of a & b matters for a rule “b follows a”
• However, what happens in between a & b doesn’t matter
• In phone faceplates dataset:
Association among items, which were bought within the
same week were discovered
How about finding what they would buy next week or the
week after, if they had bought ‘x’ in this week?
© 2013 ExcelR Solutions. All Rights
Reserved
Applications
• Identify the appropriate Basket
• Identify popular taxi routes
Sequential pattern from GPS tracks;
spatiotemporal records of taxi
trajectories
First cluster collocated customers
© 2013 ExcelR Solutions. All Rights
Reserved
CONTACT US
www.excelr.com
[email protected]
+91 9880913504
ExcelR - Data Science, Data Analytics Course Training in Pune
Address: 102, 1st Floor, Phase II, Prachi Residency Opposite to
Kapil Malhar, Baner Rd, Baner, Pune,Maharashtra 411046
Hour: Mon- Sat 07AM – 11PM
Established in Year: 2013
THANK YOU
© 2013 ExcelR Solutions. All Rights
Reserved
Comments