pIRRitchard

Last updated on Apr 5, 2021 7 min read

Welcome to pIRRitchard!

At the time of writing, I am involved on a project wherein we are conducting a content analysis. Specifically, we are coding approximately 150 variables across about 200 journal articles. Due to the abundance of articles being coding and lengthy coding procedure, several coders were recruited and trained. Like many other project, coders will independently code a subset of articles to determine the degree to which they agree on coding. A high reliability suggests that can move forward with a unique subset of articles, while low reliability suggests that the coding rubric needs tweaking or clarification, or the coders require additional training.

One of my roles is to complete reliability estimates for the 150-ish variables to determine the best next steps for the project (i.e., rubric adjustments, additional training, or move forward with independent coding). However, this created a slight predicament. To calculate the reliability estimates for 150 variables would be tedious. As someone who’s never felt completely comfortable with creating functions in R, I was intimidated at the task ahead. However, I decided that a function, or series or functions, that could quickly compute and create an output of any amount of reliability estimates, and create a quick plot may be beneficial for those new to R. My initial thought was to create a function to quickly share with labmates, but I thought it may be a great first exposure to creating a R package.

Thus, I present pIRRitchard, a basic package of functions that can calculate reliability estimate across multiple raters and criteria, plot the results, and simulate a dataset to test the package. The following post will describe the first version of the package, but additional updates may be addressed on the associated Github page.

The first and main function is pIRRitchard(data, n_raters, type), which has three main parameter inputs. The first specifies the data.frame to be used, which must have a specified format. The data.frame must be in wide format, wherein criteria and coder represents a specific column. Additionally, these columns must progress across raters, then criteria. Last, an initial ID column should exist. For example, the following is an example of a typical pIRRitchard data.frame that consists of 3 raters/coders, 4 criteria/variables, and 5 units being analyzed.

dat <- pIRRitchard_data(criteria = 4, raters = 3, units = 5)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.5     ✓ dplyr   1.0.3
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

dat

##   id Rater1.Criteria1 Rater2.Criteria1 Rater3.Criteria1 Rater1.Criteria2
## 1  1                1                1                1                0
## 2  2                1                1                1                1
## 3  3                0                0                0                0
## 4  4                0                0                0                1
## 5  5                0                0                1                0
##   Rater2.Criteria2 Rater3.Criteria2 Rater1.Criteria3 Rater2.Criteria3
## 1                0                1                1                0
## 2                0                1                1                0
## 3                1                0                0                1
## 4                1                1                1                1
## 5                1                0                0                0
##   Rater3.Criteria3 Rater1.Criteria4 Rater2.Criteria4 Rater3.Criteria4
## 1                1                1                0                1
## 2                1                0                1                1
## 3                1                1                1                1
## 4                0                0                0                1
## 5                0                0                0                1

If your data is not in wide format, you can use the pivot_wider() function from the tidyr package. For example, the following long data.frame:

dat_long <- data.frame(id = rep(1:5, each=3), 
                       coder = rep(c("Rater1", "Rater2", "Rater3"), times=5),
                       Criteria1 = round(runif(15)),
                       Criteria2 = round(runif(15)),
                       Criteria3 = round(runif(15)),
                       Criteria4 = round(runif(15)))
dat_long

##    id  coder Criteria1 Criteria2 Criteria3 Criteria4
## 1   1 Rater1         1         0         0         0
## 2   1 Rater2         0         0         1         1
## 3   1 Rater3         1         0         0         0
## 4   2 Rater1         1         1         0         1
## 5   2 Rater2         1         1         0         1
## 6   2 Rater3         1         1         0         0
## 7   3 Rater1         1         1         0         0
## 8   3 Rater2         0         0         1         0
## 9   3 Rater3         1         1         1         0
## 10  4 Rater1         0         1         1         1
## 11  4 Rater2         0         0         0         0
## 12  4 Rater3         0         1         0         1
## 13  5 Rater1         1         1         0         0
## 14  5 Rater2         0         0         0         1
## 15  5 Rater3         1         1         0         1

could be transformed into wide form with:

dat_wide <- pivot_wider(dat_long, id_cols = id, names_from = "coder", values_from = c("Criteria1","Criteria2","Criteria3","Criteria4"))
dat_wide

## # A tibble: 5 x 13
##      id Criteria1_Rater1 Criteria1_Rater2 Criteria1_Rater3 Criteria2_Rater1
##   <int>            <dbl>            <dbl>            <dbl>            <dbl>
## 1     1                1                0                1                0
## 2     2                1                1                1                1
## 3     3                1                0                1                1
## 4     4                0                0                0                1
## 5     5                1                0                1                1
## # … with 8 more variables: Criteria2_Rater2 <dbl>, Criteria2_Rater3 <dbl>,
## #   Criteria3_Rater1 <dbl>, Criteria3_Rater2 <dbl>, Criteria3_Rater3 <dbl>,
## #   Criteria4_Rater1 <dbl>, Criteria4_Rater2 <dbl>, Criteria4_Rater3 <dbl>

As you can see, the data now progresses as an ID column, to columns for criteria 1 for each raters, columns for criteria 2 for each rater, and so on (i.e., C1id, C2v1r1, C3v1r2, C4v2r1, C5v2r2).

Second, pIRRitchard requires users to specify the number of raters. This tells the function how much columns to sequence for the reliability estimates. For example, when specified as n_raters = 3, the function knows to calculate estimate for columns 2-4, 5-7,…, (n-2) - n.

Last, pIRRitchard can generate two major reliability estimate for a variety of criteria. The estimates are the commonly used Fleiss’s Kappa and Gwet’s AC1. Users simply specify that type = "fleiss" or type = "ac1". Details on their calculations and usage are beyond the scope of this post.

The output for the main pIRRitchard function is a \(n_{criteria}\) X \(4\) data.frame. Column one specifies the coefficient, column 2 the reliability estimate, column 3 the first column used in the sequence used to calculate the estimate, and column 4 the last column used in the estimate. For example, the following output is from the above example with 3 raters and 4 raters:

reliability_output <- pIRRitchard(dat_wide, n_raters = 3, type = "ac1")
reliability_output

##      variable      ac1      first_input       last_input
## 1   coeff.val  0.23077 Criteria1_Rater1 Criteria1_Rater3
## 2 coeff.val.1  0.23077 Criteria2_Rater1 Criteria2_Rater3
## 3 coeff.val.2  0.34307 Criteria3_Rater1 Criteria3_Rater3
## 4 coeff.val.3 -0.06195 Criteria4_Rater1 Criteria4_Rater3

The first and last column used for the estimate will be the main indicator for which criterion the estimate is referencing, and as a potential check to determine if the appropriate columns were used.

Wow, that was simple! We can easily create a plot of the previous output using the pIRRitchard_plot function. This is not new or innovative, it’s a basic ggplot. The only parameter needed is an output of the main pIRRitchard function, or, any data.frame wherein the second column is the reliability estimates (i.e., the function calls on data[,2] a few times). There is an optional parameter, benchmarks, which is logical TRUE or FALSE. If true, the function creates a shaded boundary from 0 to 1 by 0.2 to indicate commonly used, although ambiguous and potentially misleading, cuttoffs.

pIRRitchard_plot(reliability_output, benchmarks = T)

or:

pIRRitchard_plot(reliability_output, benchmarks = F)

The last function simulations a data.frame in wide form. It uses three parameters: criteria, raters, and units, and results in a units X (criteria x raters) data.frame. Let’s create a much larger data.frame to showcase the POWER of the package (sarcasm intended). The following data.frame contains 200 rows, and column for each of 4 raters by 100 criteria being coded as 0 or 1. We want to calculate a Fleiss’s Kappa estimate for each of the 100 crtieria.

example_data <- pIRRitchard_data(criteria = 100, raters = 4, units = 200)

The output using the main function is specified as:

output_example <- pIRRitchard(example_data, n_raters = 4, type = "fleiss")
head(output_example)

##      variable   fleiss      first_input       last_input
## 1   coeff.val  0.01784 Rater1.Criteria1 Rater4.Criteria1
## 2 coeff.val.1 -0.00014 Rater1.Criteria2 Rater4.Criteria2
## 3 coeff.val.2 -0.02207 Rater1.Criteria3 Rater4.Criteria3
## 4 coeff.val.3 -0.01182 Rater1.Criteria4 Rater4.Criteria4
## 5 coeff.val.4 -0.02839 Rater1.Criteria5 Rater4.Criteria5
## 6 coeff.val.5 -0.00361 Rater1.Criteria6 Rater4.Criteria6

and plotting the results, with benchmarks (for some reason), as:

pIRRitchard_plot(output_example, benchmarks = T)

There you have it. A simple way to calculate many Fleiss’s Kappa or Gwet’s AC1 reliability estimates across many raters. I hope you find some benefit to the package and I would love input on suggestions. Take care.

Tyler Pritchard

Lab Director, Professor, Researcher, and Clinician

My research interests include suicide theory, research methods and statistics, and online activity’s impact on mental health and illness.