#### Chi-square

Chi-square tests are common in psychological science (Bakker & Wicherts, 2011). These tests compare the observed (i.e., the actual frequency) versus the expected (i.e., \(expected_{i,j} = \frac{n_{rowi}*n_{colj}}{n_{tot}}\)) *frequencies* in a \(Row* Column\) contingency tables and are sometimes referred to as crosstabs (e.g., SPSS). Formally, the Chi-square statistic is defined as:

\(\chi^2 = \Sigma\frac{(O-E)^2}{O}\)

with degrees of freedom:

\(df = (n_{rows}-1)*(n_{cols}-1)\)

Despite the ubiquity of these tests, post-hoc analyses may be less common. That is, given any contingency table, we often see omnibus tests of statistical significance (i.e., \(p(\chi^2|H_0) < \alpha\)), but follow-up analyses may be less explored or even understood. Reflect on the following:

- How would you conduct a post-hoc analysis in a Chi-square test with many cells?
- How would you determine if a specific row, column, or
**cell**is largely driving the results?

I would imagine that readers have a vague understanding of what to do, such as calculating some residual or another Chi-square test on a subset of the contingency table. I present a partial companion to this paper and provide example R scripts to work through your own post-hoc analyses using residuals.

### Setting the Stage

Imagine a researcher who works at a medical clinic is conducting a study investigating if clients from certain regions are more likely to be diagnosed with specific disorders. Ultimately, the research believes that psychologists in the medical clinical are pathologizing cultural differences, resulting in a higher number of Social Anxiety Disorder (SAD) diagnoses for children who come from rural regions versus those from non-rural areas (i.e., metro, urban) Thus, the research hypothesizes:

- There is an association between rurality and SAD diagnosis, such that individuals from rural regions are more likely to be diagnosed with SAD than those from non-rural regions.
- The null hypothesis being: \(\chi^2 = 0\)

The research conducts a power analysis based on previous literature, which suggested an expected effect of \(W = .20\), using \(\alpha = .01\), \(1-\beta = .8\) (i.e., power), and \(df=(n_{row}-1)*(n_{col}-1)=2\).

```
pwr.chisq.test(w = .20,
df = 2,
sig.level = .01,
power = .8)
```

```
##
## Chi squared power calculation
##
## w = 0.2
## N = 347.0175
## df = 2
## sig.level = 0.01
## power = 0.8
##
## NOTE: N is the number of observations
```

Luckily, the medical center has data from exactly 348 clients. The researcher compiles the data into the following contingency table:

```
set.seed(1111)
dat_chi1 <- data.frame(id = 1:300,
rural = sample(size = 300, x=c("urban", "metro"), replace = T,prob = c(.4, .6)),
sad_dx = sample(size = 300, x=c("sad", "no_dx"), replace = T, prob = c(.05, .95)))
set.seed(1111)
dat_chi2 <- data.frame(id = 301:348,
rural = "rural",
sad_dx = sample(size = 48, x=c("sad", "no_dx"), replace = T, prob = c(.25, .75)))
dat_chi <- rbind(dat_chi1, dat_chi2) %>%
mutate_all(as.factor)
head(dat_chi)
```

```
## id rural sad_dx
## 1 1 metro no_dx
## 2 2 metro no_dx
## 3 3 urban no_dx
## 4 4 metro no_dx
## 5 5 urban no_dx
## 6 6 urban sad
```

and the following contingency table:

`table(dat_chi$rural, dat_chi$sad_dx)`

```
##
## no_dx sad
## metro 174 8
## rural 32 16
## urban 111 7
```

Inspection would lead us to believe that the chi-square test would be statistically significant; a higher proportion of rural clients seem to be diagnosed than metro or urban counterparts. A formal test reveals (drumroll please):

`chisq.test(dat_chi$rural, dat_chi$sad_dx)`

```
## Warning in chisq.test(dat_chi$rural, dat_chi$sad_dx): Chi-squared approximation
## may be incorrect
```

```
##
## Pearson's Chi-squared test
##
## data: dat_chi$rural and dat_chi$sad_dx
## X-squared = 41.145, df = 2, p-value = 1.163e-09
```

```
prob_table <- prop.table(table(dat_chi$rural, dat_chi$sad_dx))
exp_table <- data.frame(matrix((c(.5*.95, .1*.95, .4*.95,
.5*.05, .1*.05, .4*.05)), nrow=3, byrow = F)) # note that H0 corresponds to clinic average of clients being rural = 10%, metro = 50%, and urban = 40%, and national average of SAD diagnosis being 5%
ES.w1(exp_table, prob_table)
```

`## [1] 0.5892387`

The results suggest that the data are unlikely given a true null that there is no association between rurality and SAD diagnosis, \(\chi^2 = 16.68\), \(df=2\), \(p < . 001\). More importantly, the data suggest a large effect when comparing this data than the expected data if SAD diagnoses occurs in 5% of the population, regardless of level of rurality, \(W = .589\).

Despite these results, what conclusions can be drawn about our initial hypothesis? While some people may rely on eyeballing the table, we will take a different approach: formal post-hoc analyses. As noted, the following is a companion to this paper.

### Calculating Residuals

Sharpe (2015) describes the formulas for calculating raw, standardized, and adjusted standardized residuals. We have already completed some of these calculation above, but as a reminder, the raw residuals is the difference between a cell’s observed and expected values:

\(RawResidual_{x} = Observed_{x} - Expected_{x}\)

For our above data, the observed values are:

```
observed <- table(dat_chi$rural, dat_chi$sad_dx)
observed
```

```
##
## no_dx sad
## metro 174 8
## rural 32 16
## urban 111 7
```

and the expected values can be calculated from the `chisq.test()`

function by subsetting the expected values:

```
expected <- chisq.test(dat_chi$rural, dat_chi$sad_dx)$expected
expected
```

```
## dat_chi$sad_dx
## dat_chi$rural no_dx sad
## metro 165.78736 16.212644
## rural 43.72414 4.275862
## urban 107.48851 10.511494
```

Thus, the differences are:

```
raw_res <- observed - expected
raw_res
```

```
##
## no_dx sad
## metro 8.212644 -8.212644
## rural -11.724138 11.724138
## urban 3.511494 -3.511494
```

However, as Sharpe (2015) notes, there are some errors with the raw residuals residuals, as larger expected values produce larger raw residuals. An alternative are the standardized residuals, which adjusted the raw residual by dividing by the square root of the expected value:

\(standardized_{x} = \frac{(Observed_{x} - Expected_{x})}{\sqrt{Expected_{x}}}\)

For us:

```
standard_res <- (observed - expected)/sqrt(expected)
standard_res
```

```
##
## no_dx sad
## metro 0.6378334 -2.0396519
## rural -1.7730472 5.6698174
## urban 0.3386967 -1.0830780
```

`### as a test for cell 1, (174-169.19540)/sqrt(169.19540) = .3693711`

Last, we can make an adjustment so that we divide the difference of the observed and expected cell frequency by the standard error of that cell estimate:

\(adjusted_{x} = \frac{(Observed_{x} - Expected_{x})}{ \sqrt{Expected_{x}*(1-n_{rx}/n_{tot})*(1-n_{cx)}/n_{tot})}}\)

For us:

```
rowmarginal <- matrix(c(182, 182, 48, 48, 118, 118), nrow = 3, byrow = T)
colmarginal <- matrix(c(317, 317, 317, 31, 31, 31), nrow=3, byrow = F)
n_mat <- 348
adj_res <- (observed-expected)/sqrt(expected*(1-rowmarginal/n_mat)*(1-colmarginal/n_mat))
adj_res
```

```
##
## no_dx sad
## metro 3.094223 -3.094223
## rural -6.398204 6.398204
## urban 1.395871 -1.395871
```

Although I demonstrate the calculations above, the `chisq.test`

function of the `stast`

package will provide them by specifying either residuals or std:`chisq.test(table)$residuals`

or `chisq.test(table)std`

. Additionally, the `questionr`

package does contain a `chisq.residuals`

function that can return raw, standardized, and adjusted residuals. For example, raw, standardized, and adjusted as:

```
# Raw
library(questionr)
chisq.residuals(observed, raw=T)
```

`## Warning in stats::chisq.test(tab): Chi-squared approximation may be incorrect`

```
##
## no_dx sad
## metro 8.21 -8.21
## rural -11.72 11.72
## urban 3.51 -3.51
```

```
#Standardized
chisq.residuals(observed)
```

`## Warning in stats::chisq.test(tab): Chi-squared approximation may be incorrect`

```
##
## no_dx sad
## metro 0.64 -2.04
## rural -1.77 5.67
## urban 0.34 -1.08
```

```
##Adjusted
chisq.residuals(observed, std=T)
```

`## Warning in stats::chisq.test(tab): Chi-squared approximation may be incorrect`

```
##
## no_dx sad
## metro 3.09 -3.09
## rural -6.40 6.40
## urban 1.40 -1.40
```

Returning to our example, it seems that across each type of residual, the frequency of rural clients, particularly with a diagnosis, differ from expected values under the null that the proportion of SAD diagnoses are the same across regions. That is, they are larger. However, what is considered a large residual? While it has been suggested that a standardized residual greater than \(±1.96\) (i.e., 2.5th and 97.5th percentiles assuming a true null; see additional material), the more cells there are the more likely any given test will contain an *extreme* residual under a true null. We would expect a Chi-square test with 20 cells (i.e., 4x5) to result in one standardized residual beyond ±1.96. A safer approach may be to apply a Bonferroni correction to control for false positives, such that the threshold for an ‘extreme’ standardized residuals is adjusted for total number of cells. For our above example with six cells, the \(\alpha=.05\) (i.e., standardized residual \(±1.96\)) can be adjusted to \(\alpha = \frac{.05}{6} = .0083\) and an associated standardized residual cutoff of about \(±2.64\).

The following can help you determine a chi-square residual cutoff, which is based on you contingency table size:

```
### Your number of rows
whats_my_cutoff <- function(nrow = 2, ncol = 2, alpha = .05){
value <- qnorm(p=1-((alpha/2)/(nrow*ncol)))
output <- paste("Your residual cutoff is ±", round(value, 3))
return(output)
}
```

**I have archived my Chi-square Residual Companion App due to excess use. Please use the above code to help with residual calculations**.

Take care, all.