ExcelR is a global leader delivering a wide gamut of management and technical training over 40 countries. With over 20 Franchise partners all over the world, ExcelR helps individuals and organisations by providing data science courses based on practical knowledge and theoretical concepts.
data science training
Data Science using R,
Minitab & XLMiner
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
R, Minitab XLMiner for Forecasting
My Introduction
Name: Bharani Kumar
Education: IIT Hyderabad
Indian School of Business
Professional certifications:
PMP PMI-ACP PMI-RMP CSM LSSGB
Project Management Professional
Agile Certified Practitioner
Risk Management Professional
Certified Scrum Master
Lean Six Sigma Green Belt LSSBB SSMBB ITIL
Lean Six Sigma Black Belt
Six Sigma Master Black Belt
Information Technology Infrastructure Library Agile PM Dynamic
System Development Methodology Atern
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
My Introduction
3
RESEARCH in ANALYTICS, DEEP LEARNING & IOT
4
2 Deloitte 1 Driven using US policies DATA
SCIENTIST
Infosys Driven using Indian policies under Large enterprises
ITC Infotech Driven using Indian policies SME
HSBC Driven using UK policies
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Tuckman Model
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
AGENDA Data Visualization using Tableau
Data Mining – Supervised &
Unsupervised (Machine
Learning)
Text Mining & NLP AGENDA
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
What does it take to be a DATA
SCIENTIST?
Domain All Agenda
Data
Knowledge Topics Statistical Analysis
Minin g
Practice
Forecasting
Successful Data Scientist
Data Visualizatio n
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Welcome to the Information
Age ... ... drowning in data and
starving for Knowledge
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
BIG DATA!
https://www.techinasia.com/alibaba-crushes-records-
brings-143-billion-singles-day
500 million tweets every day, 1.3 billion accounts
YouTube users upload 100 hours of video every minute
100 terabytes of data uploaded daily
http://www.dnaindia.com/scitech/report-facebook-saw-
one-billion-simultaneous-users-on-aug-24-2119428
Processing 100 petabytes a day (1 petabyte = 1000
terabytes)
More than 1 million customer transactions every hour
306 items are purchased every second 26.6 Million
transactions per day
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Why Tableau?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Why Tableau?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Why Tableau?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Why Tableau?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Agenda – Basic Statistics
Graphical representation
– Barplot, Histogram,
12345 Boxplot, Scatter diagram
Data Types – Simple Linear
Continuous, Discrete, Regression
Nominal, Ordinal,
Interval, Ratio, Random
Variable, Probability, Hypothesis Testing
Probability Distribution
First, second, third & © 2013 - 2016 ExcelR Solutions.
fourth moment business All Rights Reserved
decisions
Data Types – Continuous &
Discrete
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Data Types – Preliminaries
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Random Variable
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Probability
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Probability Distribution
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Probability Applications
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Sampling Funnel
Population
Sampling Frame
SRS
Sample
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Measures of Central Tendency
“Every American should have above
average income, and my Administration is
going to see they get it.” – American
President
Central Tendency Population Sample
Mean / Average
Median Middle value of the data
Mode Most occurring value in the data
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Measures of Dispersion
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Measures of Dispersion
Dispersion Population Sample
Variance
Standard Deviation
Range Max – Min
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Expected Value
◆ For a probability distribution, the mean of the distribution
is known as the expected
value
◆ The expected value intuitively refers to what one would
find if they repeated the
experiment an infinite number of times and took the
average of all of the outcomes
◆ Mathematically, it is calculated as the weighted average
of each possible value
The formula for calculating The variance of a discrete
the expected value for a random variable X,
discrete random variable denoted by σ2 is
X, denoted by μ, is:
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Graphical Techniques – Bar
Chart
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Graphical Techniques –
Histogram
A Histogram Represents the frequency distribution, i.e.,
how many observations take the value within a certain
interval.
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Skewness & Kurtosis
Third and Fourth moments
Skewness Kurtosis
• A measure of the • A measure of asymmetry in
“Peakedness” of the the distribution
distribution • Mathematically it is given by
• Mathematically it is given by E[(x-μ/σ)]3
E[(x-μ/σ)]4 -3 • Negative skewness implies
• For Symmetric distributions, mass of the distribution is
negative kurtosis implies wider concentrated on the right
peak and thinner tails
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Graphical Techniques – Box Plot
Box Plot : This graph shows the you ignore outliers, the range is
distribution of data by dividing the data illustrated by the distance
into four groups with the same number between the opposite ends of the
of data points in each group. The box
contains the middle 50% of the data whiskers
points and each of the two whiskers Range(IQR): The middle half
contain 25% of the data points. It of a data set falls within the
displays two common measures of the inter- quartile range
variability or spread in a data set
Range : It is represented on a box
plot by the distance between the
smallest value and the largest Inter- quartile
value, including any outliers. If
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Normal Distribution
▪ The normal random variable takes values
from -∞ to +∞
▪ The Probability associated with any single value of
a random variable is always zero
▪ Area under the entire curve is always equal to 1
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Characterized by a bell shaped curve
Normal Has the following properties:
Distribution
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
68.26% of values lie within ±1 σ from the 99.73% of the values lie within ± 3σ from
mean the mean
95.46% of the values lie within ±2 σ from 99.73% of the values lie within ± 3σ from
the mean the mean
Normal Distribution
Characterized by mean, μ, and standard deviation, σ
X~N(μ,σ)
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Z scores, Standard Normal
Distribution
• For every value (x) of the random variable X, we can
calculate Z score:
X−μ
Z = σ
• Interpretation − How many standard deviations
away is the value from the mean ?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Calculating Probability from Z
distribution
Suppose GMAT scores can be reasonably modelled using
a normal distribution
− μ = 711 σ = 29
What is p(x ≤ 680)?
Step 1: Calculate Z score corresponding to 680
- Z = (680-711)/29 = -1.06
Step 2: Calculate the probabilities using Z – Tables
- P(Z ≤ -1) = 0.14
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Calculating Probability from Z
distribution
• What is P( 697 ≤ X ≤ 740) ?
• Step 1 : Use P(x1 ≤ X ≤ x2) = Use P( X ≤ x2) − P(
X ≤ x1)
• Step 2 : Calculate P( X ≤ x2) and P( X ≤ x1) as
before
P( X ≤ 740) = P( Z ≤ 1) = 0.84 ; P( X ≤ 697) = P( Z
≤ - 0.5) = 0.31
• Step 3 : Calculate P( 697 ≤ X ≤ 740 ) = 0.84 –
0.31 = 0.53
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Normal Quantile (Q-Q) Plot
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
ample Quantiles
S
Theoretical Quantiles
Sampling variation
▪ Sample mean varies from one sample to another
▪ Sample mean can be (and most likely is) different from
the population mean
▪ Sample mean is a random variable
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Central Limit Theorem
The Distribution of the sample mean
- will be normal when the distribution of data in the population is
normal
- will be approximately normal even if the distribution of data in the
population is not normal
if the “sample size” is fairly large
_ Mean ( X ) = μ ( the same as the population mean of the raw
data)
Standard Deviation (X) = √n
σ , where σ is the population standard deviation and n is the sample size
- This is referred to as standard error of mean
The standard error of the mean estimates the variability
between samples whereas the standard deviation
measures the variability within a single sample
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Sample Size Calculation
A Sample Size of 30 is considered large enough, but that
may /may not be adequate
More Precise conditions
- n > 10( K3 )2 , where ( K3 ) is sample skewness and - n >
10( K4 ) , where ( K4) is sample kurtosis
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Confidence Interval
• What is the Probability of tomorrow’s temperature being
42 degrees ?
Probability is ‘0’
• Can it be between [-500C & 1000C] ?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Case Study: Confidence Interval
• A University with 100,000 alumni is thinking of offering a
new affinity credit card to its alumni.
• Profitability of the card depends on the average balance
maintained by the card holders.
• A Market research campaign is launched, in which about
140 alumni accept the card in a pilot launch.
• Average balance maintained by these is $1990 and the
standard deviation is $2833. Assume that the population
standard deviation is $2500 from previous launches.
• What we can say about the average balance that
will be held after a full−fledged market launch ?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Interval estimates of parameters
• Based on sample data
− The point estimate for mean balance = $1990
− Can we trust this estimate ?
• What do you think will happen if we took another random
sample of 140 alumni ?
• Because of this uncertainty, we prefer to provide the
estimate as an interval (range) and associate a level of
confidence with it
Point Estimate ±
Interval Estimate =
Margin of Error
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Confidence Interval for the
Population Mean
Start by choosing a confidence level (1-α) % (e.g. 95%,
99%, 90%)
Then, the population mean will be with in
X _
±Z1-ᾳ √n σ where Z1-ᾳ satisfies p( -Z1-ᾳ ≤ Z ≤ Z1-ᾳ) = 1-ᾳ
Point Estimate ±
Interval Estimate =
Margin of Error
Margin of error depends on the underlying uncertainty,
confidence level and sample size
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Calculate Z value - 90%, 95% &
99%
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Confidence Interval Calculation
• Based on the survey and past data
− n = 140; σ = $2500; _
X = $ 1990 − σ - X
2500
= σ √n =
= 211.29
√140
• Construct a 95% confidence interval for the mean card
balance and interpret it ?
• Construct a 90% confidence interval for the mean card
balance and interpret it ?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Confidence Interval
Interpretation
Consider the 95% Confidence interval for the mean
income : [$1576, $2404]
Does this mean that
- The mean balance of the population lies in the range ?
- The mean balance is in this range 95% of the time ?
- 95% of the alumni have balance in this range ?
Interpretation 1 : Mean of the population has a 95% chance of being
in this range for a random sample
Interpretation 2 : Mean of the population will be in this range for
95% of the random samples
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
What if we don’t know Sigma?
• Suppose that the alumni of this university are very
different and hence population standard deviation from
previous launches can not be used
We replace σ with our best guess (point estimate) s, which is the
standard deviation of the sample:
Calculate
• If the underlying population is normally distributed , T is a
random variable distributed according to a t-distribution
with n-1 degrees of freedom Tn-1
• Research has shown that the t-distribution is fairly robust
to deviation of the population of the normal model
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Student’s t-distribution
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
As n ꝏ
tn N(0,1)
i.e., as the degrees of the freedom increase, the t-
distribution approaches the standard normal distribution
Confidence Interval for mean
with unknown Sigma
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Calculating t-value
• Construct a 95% confidence interval for the mean card
balance and interpret it?
− n = 140; σ = $2500; X = $ 1990
− σ X
_
- = 2833
= 239.46
√140
Calculate t0.95, 139 = 1.98
Then the 95% confidence interval for balance is
[$1516, $2464]
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Hypothesis Testing
Start with Hypothesis about a Population Parameter
1-α Collect Sample Information
Reject/Do Not Reject Hypothesis
1-β
The factors that affect the power of a test include sample size, effect
size, population variability, and α. Power and α are related as
increasing α decreases β. Since power is calculated by 1 minus β, if
you increase α,You also increase the power of a test. The maximum
power a test can have is 1, whereas the minimum value is 0.
Right Decision Confidence
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Ho is TRUE H1 is TRUE
Fail to Reject Ho
Reject Ho
Type II error
Type I error
Right Decision Power
Hypothesis Testing
Our quality will not improve after the consulting project
We will acquire 8,000 new customers if I open a store in this area
Our potential customers do The retail market will grow by
not spend more than 60 50% in the next 5 years
minutes on the web every day
Less than 5% clients will default on their loans
We will need 400 more person hours to finish this project
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Hypothesis Testing
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Hypothesis Testing
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
1-Sample Z test
1
2 3
Normality Test
Population Standard Deviation Known or Not Stat > Basic Statistics >
Graphical Summary
Fabric Data
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
The length of 25 samples of a fabric are taken at random. Mean and
standard deviation from the historic 2 years study are 150 and 4
respectively. Test if the current mean is greater than the historic mean.
Assume α to be 0.05
1 Sample Z Test
Stat > Basic Statistics > 1 Sample Z
1-Sample Z test – Write
Hypothesis
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
We are comparing mean with Population standard deviation is
external standard of 150mm known=4
© 2013 - 2016 ExcelR Solutions.
Data was shown to be normal All Rights Reserved
Y: Fabric Length is continuous X:
Discrete 1 Population
1-Sample t Test
1
2 3
Normality Test
Population Standard Deviation Known or Not Stat > Basic Statistics >
Graphical Summary
Bolt Diameter
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
The mean diameter of the bolt manufactured should be 10mm to be able to
fit into the nut. 20 samples are taken at random from production line by a
quality inspector. Conduct a test to check with 95% confidence that the
mean is not different from the specification value.
1 Sample t Test
Stat > Basic Statistics > 1 Sample t
1-Sample t Test – Write
Hypothesis
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Y: Bolt Diameter is continuous X: Discrete 1 Population We are comparing
mean with external standard of 10mm
Data was given to be Normal
Population standard deviation is NOT known
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
1-Sample Sign Test
1
3
Normality Test
1 Sample Sign Test
Stat > Basic Statistics >
Stat > Non Parametric > Graphical Summary
1 Sample sign
Student Scores
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
The scores of 20 students for the statistics exam are provided. Test if the
current median is not equal to historic median of 82. Assume ‘α’ to be 0.05
1-Sample Sign Test – Write
Hypothesis
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
2-Sample t Test
1
2 3
Normality Test
Variance Test
Stat > Basic Statistics >
Stat > Basic Statistics > Graphical Summary
2 Variance
Marketing Strategy
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
A financial analyst at a Financial institute wants to evaluate a recent credit
card promotion. After this promotion, 450 cardholders were randomly
selected. Half received an ad promoting a full waiver of interest rate on
purchases made over the next three months, and half received a standard
Christmas advertisement. Did the ad promoting full interest rate waiver,
increase purchases?
2 Sample t Test
Stat > Basic Statistics > 2-Sample t
2-Sample t Test – Write
Hypothesis
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Hypothesis Testing
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Paired T Test
• This test is used to compare the means of two sets of
observations when all the other external conditions are the
same
• This is a more powerful test as the variability in the
observations is due to differences between the people or
objects sampled is factored out
Example: To find out if medication A lowers blood
pressure
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Trigger your Compare the power output of two wind mills next to each
other simultaneously when you
thoughts! use motor A on one wind mill
and motor B on another
Comparing the performance of
machine A vs. machine B by Identifying resistor defects and
feeding different raw materials capacitor defects in same PCB
to each machine by collecting such data using
20 PCB units
Compare the performance of for 1 month and motor B for 1
machine A vs. machine B when month
the same raw material is fed to
each machine
Identifying resister defects on
Compare the power output of a 20 PCB’s and capacitor defects
wind mill when you use motor A on 20 (different) PCB’s
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
2-Sample t test or Paired T test
Effect of fuel additive on vehicles is being studied. Out of a
total of 20 vehicles, 10 vehicles are chosen randomly and
mileage is recorded. In rest of the 10 vehicles, additive to
be tested is added with the fuel and their mileage is
recorded. Find if the mileage increases by adding the fuel
additive.
2-Sample t test
Assume the same data was recorded if only 10 vehicles
were chosen and mileage was recorded before and after
adding the additive. What method will you choose to find
the result.
Paired T test
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Mann-Whitney test
1
2
Normality Test
Mann – Whitney test for Medians Stat > Basic Statistics > Graphical
Summary
Stat > Non Parametric > Mann Whitney
Vehicle with & without Additives
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Effect of fuel additive on vehicles is being studied. Out of a total of 20
vehicles, 10 vehicles are chosen randomly and mileage is recorded. In rest
of the 10 vehicles, additive to be tested is added with the fuel and their
mileage is recorded. Find if the mileage increases by adding the fuel
additive.
Mann-Whitney Test – Write
Hypothesis
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Paired T test
1
2
Normality Test
Paired T Test
Stat > Basic Statistics >
Stat > Basic Statistic > Graphical Summary
Paired T
Vehicle with
& without Additives
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Effect of fuel additive on vehicles is being studied. Out of a total of 20
vehicles, 10 vehicles are chosen randomly and mileage is recorded. In rest
of the 10 vehicles, additive to be tested is added with the fuel and their
mileage is recorded. Find if the mileage increases by adding the fuel
additive. Assume the same data was recorded if only 10 vehicles were
chosen and mileage was recorded before and after adding the additive.
• Since the data was not normal, the cause of non-normality was investigated and it was
found that the first data point for “with additive” was wrongly entered. This value should
have been 20. Now, proceed with the rest of the analysis.
• If the data were truly non-normal our analysis would stop here.
Paired T test – Write Hypothesis
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
One-Way ANOVA
1
2 3
Normality Test
Variance Test
Stat > Basic Statistics >
Stat > ANOVA > Graphical Summary
Test for Equal Variances
Contract Renewal
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
A marketing organization outsources their back-office operations to three
different suppliers. The contracts are up for renewal and the CMO wants to
determine whether they should renew contracts with all suppliers or any
specific supplier. CMO want to renew the contract of supplier with the least
transaction time. CMO will renew all contracts if the performance of all
suppliers is similar
ANOVA
Stat > ANOVA > One-Way....
Example : More weight reduction
programs
• Suppose the nutrition expert would like to do a
comparative evaluation of three diet programs(Atkins,
South Beach, GM)
• She randomly assigns equal number of participants to
each of these programs from a common pool of volunteers
• Suppose the average weight losses in each of the
groups(arms) of the experiments are 4.5kg, 7kg, 5.3kg
• What can she conclude?
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Two kinds of variation matter
• Not every individual in each program will respond
identically to the diet program
• Easier to identify variations across programs if variations
within programs are smaller
• Hence the method is called Analysis of
Variance(ANOVA)
• With-in group variation = Experimental Error
• Between group variation
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Formalizing the intuition behind
variations
• It should be obvious that for every observation : Totij = ti +
eij
• What is more surprising and useful is:
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Statistically test for equality
means
• n subjects equally divided into r groups
• Hypothesis - H0: μ1 = μ2 = μ3 = ... = μr - Not all μi are
equal
• Calculate - Mean Square Treatment MSTR = SSTR / (r‐
1) - Mean Square Error MSE = SSE / (n‐r) - The ratio of
two squares f = MSTR/MSE = Between group
variation/Within group variation - Strength of this evidence
p‐value = Pr(F(r‐1,n‐r) ≥ f)
• Reject the null hypothesis if p‐value < α
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Analysis of variance(ANOVA)
• ANOVA can be used to test equality of
means when there are more then 2
populations
• ANOVA can be used with one or two
factors
• If only one factor is varying, then we would
use a one-way ANOVA
– Example: We are interested in comparing the mean performance of
several departments within a company. Here the only factor is the name of
department
– If there are two factors, we would use a two way ANOVA. Example: One
factor is department and the second factor is the shift.(day vs. Night)
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Analysis of variance(ANOVA)
Source of Variation Sum of Squares (SS) Degrees of Freedom Mean
Square (MS) F Test Statistic
Between Treatments SSFactor K-1 DFFactor
MSFactor = SSFactor / F = MSFactor / MSError
One Way ANOVA
Within Treatment SSError N-k MSError = SSError /
DFError
Total SSTotal N-1
Two Way ANOVA
Source of Variation Sum of Squares (SS) Degrees of Freedom Mean
Square (MS) F Test Statistic
Factor A SSA nA- 1 MSA = SSA / (nA– 1) FA = MSA / MSE
Factor B SSB nB- 1 MSB = SSB / (nB– 1) FB = MSB / MSE
Interaction A * B SSAB (nA– 1) (nB– 1) MSAB = SSAB / (nAB– 1) FAB = MSAB / MSE
Error SSE n – nA * nB MSE = SSE / (n – nA * nB)
Total SST n -1
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Is the Transaction time dependent on
Dichotomies whether person A or B processes
thetransaction?
Is medicine 1 effective or medicine 2
at reducing heart stroke? Three different sale closing methods
were used. Which one is most
Is the new branding program more effective?
effective in increasing profits?
Does the productivity of employees Four types of machines are used. Is
vary depending on the three levels? weight of the Rugby ball dependent on
(Beginner, Intermediate and the type of machine used?
Advanced)
2 Sample t-test ANOVA – One Way
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Non-Parametric equivalent to
ANOVA
• When the data are not normal or if the data points
are very few to figure out if the data are normal and
we have more than 2 populations, we can use the
Mood’s Median or Kruskal Wallis test to compare the
populations
Ho : All the medians are the same Ha: One of the
medians is different
• Mood’s median assigns the data from each
population that is higher than the overall median to
one group, and all points that are equal or lower to
another group. It then uses a Chi-Square test to
check if the observed frequencies are close to
expected frequencies
• Kruskal Wallis is another test that is non-
parametric equivalent of ANOVA. Kruskal Wallis is
the extension of Mann-Whitney test
© 2013 - 2016 ExcelR Solutions. All Rights Reserved
Comments