Photo by Jerry Zhang on Unsplash

Residuals for Post-hoc Analysis in Chi-square

Chi-square

Chi-square tests are common in psychological science (Bakker & Wicherts, 2011). These tests compare the observed (i.e., the actual frequency) versus the expected (i.e., \(expected_{i,j} = \frac{n_{rowi}*n_{colj}}{n_{tot}}\)) frequencies in a \(Row* Column\) contingency tables and are sometimes referred to as crosstabs (e.g., SPSS). Formally, the Chi-square statistic is defined as:

\(\chi^2 = \Sigma\frac{(O-E)^2}{O}\)

with degrees of freedom:

\(df = (n_{rows}-1)*(n_{cols}-1)\)

Despite the ubiquity of these tests, post-hoc analyses may be less common. That is, given any contingency table, we often see omnibus tests of statistical significance (i.e., \(p(\chi^2|H_0) < \alpha\)), but follow-up analyses may be less explored or even understood. Reflect on the following:

  • How would you conduct a post-hoc analysis in a Chi-square test with many cells?
  • How would you determine if a specific row, column, or cell is largely driving the results?

I would imagine that readers have a vague understanding of what to do, such as calculating some residual or another Chi-square test on a subset of the contingency table. I present a partial companion to this paper and provide example R scripts to work through your own post-hoc analyses using residuals.

Setting the Stage

Imagine a researcher who works at a medical clinic is conducting a study investigating if clients from certain regions are more likely to be diagnosed with specific disorders. Ultimately, the research believes that psychologists in the medical clinical are pathologizing cultural differences, resulting in a higher number of Social Anxiety Disorder (SAD) diagnoses for children who come from rural regions versus those from non-rural areas (i.e., metro, urban) Thus, the research hypothesizes:

  • There is an association between rurality and SAD diagnosis, such that individuals from rural regions are more likely to be diagnosed with SAD than those from non-rural regions.
  • The null hypothesis being: \(\chi^2 = 0\)

The research conducts a power analysis based on previous literature, which suggested an expected effect of \(W = .20\), using \(\alpha = .01\), \(1-\beta = .8\) (i.e., power), and \(df=(n_{row}-1)*(n_{col}-1)=2\).

pwr.chisq.test(w  = .20,
               df = 2,
               sig.level = .01,
               power = .8)
## 
##      Chi squared power calculation 
## 
##               w = 0.2
##               N = 347.0175
##              df = 2
##       sig.level = 0.01
##           power = 0.8
## 
## NOTE: N is the number of observations

Luckily, the medical center has data from exactly 348 clients. The researcher compiles the data into the following contingency table:

set.seed(1111)
dat_chi1 <- data.frame(id = 1:300,
                      rural = sample(size = 300, x=c("urban", "metro"), replace = T,prob = c(.4, .6)),
                      sad_dx = sample(size = 300, x=c("sad", "no_dx"), replace = T, prob = c(.05, .95)))

set.seed(1111)
dat_chi2 <- data.frame(id = 301:348,
                      rural = "rural",
                      sad_dx = sample(size = 48, x=c("sad", "no_dx"), replace = T, prob = c(.25, .75)))

dat_chi <- rbind(dat_chi1, dat_chi2) %>% 
  mutate_all(as.factor)

head(dat_chi)
##   id rural sad_dx
## 1  1 metro  no_dx
## 2  2 metro  no_dx
## 3  3 urban  no_dx
## 4  4 metro  no_dx
## 5  5 urban  no_dx
## 6  6 urban    sad

and the following contingency table:

table(dat_chi$rural, dat_chi$sad_dx)
##        
##         no_dx sad
##   metro   174   8
##   rural    32  16
##   urban   111   7

Inspection would lead us to believe that the chi-square test would be statistically significant; a higher proportion of rural clients seem to be diagnosed than metro or urban counterparts. A formal test reveals (drumroll please):

chisq.test(dat_chi$rural, dat_chi$sad_dx)
## Warning in chisq.test(dat_chi$rural, dat_chi$sad_dx): Chi-squared approximation
## may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  dat_chi$rural and dat_chi$sad_dx
## X-squared = 41.145, df = 2, p-value = 1.163e-09
prob_table <- prop.table(table(dat_chi$rural, dat_chi$sad_dx))
exp_table <- data.frame(matrix((c(.5*.95, .1*.95, .4*.95,
                          .5*.05, .1*.05, .4*.05)), nrow=3, byrow = F)) # note that H0 corresponds to clinic average of clients being rural = 10%, metro = 50%, and urban = 40%, and national average of SAD diagnosis being 5%
ES.w1(exp_table, prob_table)
## [1] 0.5892387

The results suggest that the data are unlikely given a true null that there is no association between rurality and SAD diagnosis, \(\chi^2 = 16.68\), \(df=2\), \(p < . 001\). More importantly, the data suggest a large effect when comparing this data than the expected data if SAD diagnoses occurs in 5% of the population, regardless of level of rurality, \(W = .589\).

Despite these results, what conclusions can be drawn about our initial hypothesis? While some people may rely on eyeballing the table, we will take a different approach: formal post-hoc analyses. As noted, the following is a companion to this paper.

Calculating Residuals

Sharpe (2015) describes the formulas for calculating raw, standardized, and adjusted standardized residuals. We have already completed some of these calculation above, but as a reminder, the raw residuals is the difference between a cell’s observed and expected values:

\(RawResidual_{x} = Observed_{x} - Expected_{x}\)

For our above data, the observed values are:

observed <- table(dat_chi$rural, dat_chi$sad_dx)
observed
##        
##         no_dx sad
##   metro   174   8
##   rural    32  16
##   urban   111   7

and the expected values can be calculated from the chisq.test() function by subsetting the expected values:

expected <- chisq.test(dat_chi$rural, dat_chi$sad_dx)$expected
expected
##              dat_chi$sad_dx
## dat_chi$rural     no_dx       sad
##         metro 165.78736 16.212644
##         rural  43.72414  4.275862
##         urban 107.48851 10.511494

Thus, the differences are:

raw_res <- observed - expected
raw_res
##        
##              no_dx        sad
##   metro   8.212644  -8.212644
##   rural -11.724138  11.724138
##   urban   3.511494  -3.511494

However, as Sharpe (2015) notes, there are some errors with the raw residuals residuals, as larger expected values produce larger raw residuals. An alternative are the standardized residuals, which adjusted the raw residual by dividing by the square root of the expected value:

\(standardized_{x} = \frac{(Observed_{x} - Expected_{x})}{\sqrt{Expected_{x}}}\)

For us:

standard_res <- (observed - expected)/sqrt(expected)
standard_res
##        
##              no_dx        sad
##   metro  0.6378334 -2.0396519
##   rural -1.7730472  5.6698174
##   urban  0.3386967 -1.0830780
### as a test for cell 1, (174-169.19540)/sqrt(169.19540) = .3693711

Last, we can make an adjustment so that we divide the difference of the observed and expected cell frequency by the standard error of that cell estimate:

\(adjusted_{x} = \frac{(Observed_{x} - Expected_{x})}{ \sqrt{Expected_{x}*(1-n_{rx}/n_{tot})*(1-n_{cx)}/n_{tot})}}\)

For us:

rowmarginal <- matrix(c(182, 182, 48, 48, 118, 118), nrow = 3, byrow = T)
colmarginal <- matrix(c(317, 317, 317, 31, 31, 31), nrow=3, byrow = F)
n_mat <- 348
adj_res <- (observed-expected)/sqrt(expected*(1-rowmarginal/n_mat)*(1-colmarginal/n_mat))

adj_res
##        
##             no_dx       sad
##   metro  3.094223 -3.094223
##   rural -6.398204  6.398204
##   urban  1.395871 -1.395871

Although I demonstrate the calculations above, the chisq.test function of the stast package will provide them by specifying either residuals or std:chisq.test(table)$residuals or chisq.test(table)std. Additionally, the questionr package does contain a chisq.residuals function that can return raw, standardized, and adjusted residuals. For example, raw, standardized, and adjusted as:

# Raw
library(questionr)
chisq.residuals(observed, raw=T)
## Warning in stats::chisq.test(tab): Chi-squared approximation may be incorrect
##        
##          no_dx    sad
##   metro   8.21  -8.21
##   rural -11.72  11.72
##   urban   3.51  -3.51
#Standardized
chisq.residuals(observed)
## Warning in stats::chisq.test(tab): Chi-squared approximation may be incorrect
##        
##         no_dx   sad
##   metro  0.64 -2.04
##   rural -1.77  5.67
##   urban  0.34 -1.08
##Adjusted
chisq.residuals(observed, std=T)
## Warning in stats::chisq.test(tab): Chi-squared approximation may be incorrect
##        
##         no_dx   sad
##   metro  3.09 -3.09
##   rural -6.40  6.40
##   urban  1.40 -1.40

Returning to our example, it seems that across each type of residual, the frequency of rural clients, particularly with a diagnosis, differ from expected values under the null that the proportion of SAD diagnoses are the same across regions. That is, they are larger. However, what is considered a large residual? While it has been suggested that a standardized residual greater than \(±1.96\) (i.e., 2.5th and 97.5th percentiles assuming a true null; see additional material), the more cells there are the more likely any given test will contain an extreme residual under a true null. We would expect a Chi-square test with 20 cells (i.e., 4x5) to result in one standardized residual beyond ±1.96. A safer approach may be to apply a Bonferroni correction to control for false positives, such that the threshold for an ‘extreme’ standardized residuals is adjusted for total number of cells. For our above example with six cells, the \(\alpha=.05\) (i.e., standardized residual \(±1.96\)) can be adjusted to \(\alpha = \frac{.05}{6} = .0083\) and an associated standardized residual cutoff of about \(±2.64\).

The following can help you determine a chi-square residual cutoff, which is based on you contingency table size:

### Your number of rows
whats_my_cutoff <- function(nrow = 2, ncol = 2, alpha = .05){
  value <- qnorm(p=1-((alpha/2)/(nrow*ncol)))
  output <- paste("Your residual cutoff is ±", round(value, 3))
return(output)
}

I have archived my Chi-square Residual Companion App due to excess use. Please use the above code to help with residual calculations.

Take care, all.

Tyler Pritchard
Lab Director, Professor, Researcher, and Clinician

My research interests include suicide theory, research methods and statistics, and online activity’s impact on mental health and illness.

Related