Assumed knowledge
-
Francis Section 3.1 "Relations Between Metric
Variables"
-
Francis Section 3.2 "Relations Between
Categorical Variables"
-
Francis Section 4.3 "Recoding Variables"
Data files
General advice
The general recommended strategy for tackling these correlation
analyses is:
- Determine the level of measurement for each variable in the
analysis
- Obtain univariate descriptive statistics and graphical displays
for each variable, to:
- check for mis-entered data
- check the frequency / central tendency / distribution
- Recode as necessary
- Create a bivariate visual display (e.g,. clustered bar graph,
scatterplot)
- Create tables (e.g., crosstabs with separate tables for row and
column %s) and relevant correlational statistics
- Interpret/conclude
Phi (φ) & Cramer's V
- qfsall.sav
- Phi and Cramer's V are used for analyzing the relationship between
two nominal/categorical variables
- Phi is used when you have 2x2, 2x3 or 3x2 tables
- Cramer’s V is is used when >=3x3 tables are analysed
- These are non-parametric tests which do not rely much on
assumptions about distribution. But you should make sure that there is
a minimum expected frequency of at least 5 in each cell. You can
get this via descriptives - crosstabs - cells - expected. If you
don't have enough data in the cells, you should recode the data into
fewer categories.
- Note that the sign (+ or -) of Phi doesn't mean much because there
is no meaningful order to the way the variables are coded.
-
Is there an association between Gender and Belief
in God? (recode to remove mis-entered data)
(φ is small (.024, p = .94) and not significant; there is no evidence of
relationship; use crosstabs and bar graph - clustered)
-
Is there an
association between snoring and smoking? (recode smoking from
continuous to dichotomous)
(φ is ~.24 and significant, p = .001; smokers are almost twice as
likely to snore as non-smokers, but be careful in interpretation -
this could be due to non-casual factors (e.g., age?); use crosstabs
and bar graph - clustered)
-
Is there an association between favourite season
and favourite sense? (recode to remove mis-entered data)
(Cramer's V is ~.23 and significant, p = .005; in
other words there is a different
profile of favourite senses, depending on favourite season, e.g.,
Almost 50% of Summer and Spring people are Visual people. Winter
people, in contrast, tend to prefer Taste and Smell; use crosstabs and
stacked area graph)
-
Is there an association between type of
household (urban/rural) and whether or not the household has chickens
(Yes/No)? [chickens.sav].
The file contains hypothetical data for two categorical
variables. Resid indicates whether households are in urban or rural
areas. Chickens indicates whether or not the household owns chickens.
(the answer to this is potential quiz question material - no clues!)
Point Bi-serial Correlation
-
Point bi-serial correlation is for analyzing
the relationship between a dichotomous and a continuous variable
-
Point bi-serial correlation is computed as for
the product-moment correlation, but
interpretation must appropriate to the direction of coding for the
dichotomous scale.
-
If you interpret the significance of a
point bi-serial, it is equivalent to doing t-test of the mean
difference between male and female's ratings of their Australianness.
-
What is the relationship between Gender
(dichotomous) and
Australianness (assume continuous)? [qfsall.sav]
(no relationship (technically it is slightly negative, i.e., males
in the sample perceive themselves as very slightly more Australian),
i.e., the (point bi-serial) correlation is very small and
non-significant; use correlation - bivariate - pearson and scatterplot
- chart options - sunflowers and line of best fit)
-
What is the relationship between Belief in God
(recode to dichotomous) and number of Countries visited?
(important to check the scatterplot on this one - there are
outliers which look like they are influencing the small,
non-significant correlation; use correlation - bivariate - pearson and
scatterplot - chart options - sunflowers and line of best fit)
Product-Moment Correlation
-
What is the relationship
between Australianness and Femininity/Masculinity? [qfsall.sav]
(the r here is .12, p = .100,
which is larger than the point bi-serial correlation for Gender and
Australianness, but is still very small and non-significant; use
correlation - bivariate - pearson and scatterplot - chart options -
sunflowers and line of best fit)
Correlation Explore & Correlation Guess
- These exercises help you to intuitively estimate a correlation based on a
scatterplot
- Correlation Explore
(explore 20 plots with .1 increments)
- Correlation Guess
(guess 20 plots with .1 increments) - try to get 25 out of 50
- Note: The following three exercises are desirable, but
unfortunately they are java applets which will not currently run due
to the UC proxy host firewall. Try to access these from off-campus if
you can - the problem has been reported, but there's no word on when
it may be fixed.
-
Guessing Correlations
(4 plot exact match to correlations) - try to average over 75%
-
Guess the Correlation
(single plot, guess exact correlation) - try to get within .1
-
Spearman's rank correlation
Exploring the Effect of Outliers
-
regressp.exe (Continue- “Explore the impact
of an outlier”)
-
Drag the white point to explore how an outlier can
inflate or deflate the correlation, hitting “Recalculate” to recompute
the correlation.
-
Where would you put the white dot to maximise the
correlation?
(as far to the ends of the line of best fit as possible)
-
Where would you put the white dot to minimise the
correlation?
(to shift the correlation towards zero, place the outlier as far as
possible to the ends of a line which would run perpendicular to the line
of best fit, crossing at the mean for X and the mean for Y)
-
Where would you put the white dot to not change
the correlation?
(on the mean for X and the mean for Y)
Correlations and Non-linear Distributions
- xy.sav
- Draw scatterplots, compute the correlations (they
are all r=.82) and explain the relationships between:
- X1 Y1
(r is appropriate – linear relationship)
- X1 Y2
(curvilinear – r not appropriate)
- X1 Y3
(strong linear, with outlier, r=.82 is not appropriate)
- X2 Y4
(restricted range, with outlier, r=.82 is not appropriate)
Outliers and Restricted Range
-
aggr.sav
-
This is a dataset collected by Bernd Heubeck
(Division of Psychology, ANU) comparing a sample of 89 children, aged
8-14, from Western Sydney with a sample of 89 children from the same
area who had been referred to a Child Psychiatric Clinic. Separate
aggressiveness ratings of the child were obtained independently from
both parents. Aggressiveness ratings can range from 0 (low) to 40
(high).
-
To what extent the mothers’ and fathers’
Aggressive Behaviour ratings agree with one another?
|