Modifying an R function to iterate (using purrr) and use non-standard evaluation (using rlang)

null

2017/12/17

Background

Research in classrooms and schools can be complex because of all of the factors that matter. A question that often comes up when we say that we observed some pattern in data is, but did you control for X?

In the context of working on an approach to find out how impactful an omitted variable would need to be to invalidate an inference, we had to modify a function that worked for a single sensitivity analysis to work for many and to be easier to use.

The initial version of the function

In the context of programming (and mathematics!), many functions take inputs and then transform them into output.

Imagine we have a function that takes two values used to calculate a t-test, the t statistic (the coefficient divided by its standard error) and the degrees of freedom of the t distribution. it then returns a bunch of output about how sensitive an inference based on the t-test is to bias or to an omitted confounding variable.

Here is a function that outputs values about the sensitivity of a t-test as a row in a row of a data frame (actually an R data.frame modified to be a bit easier to work with, an R tibble).

It’s a bit of a whopper:

core_sensitivity_mkonfound <- function(t, df, alpha = .05, tails = 2) {
    
    critical_t <- stats::qt(1 - (alpha / tails), df)
    critical_r <- critical_t / sqrt((critical_t ^ 2) + df)
    
    obs_r <- abs(t / sqrt(df + (t ^ 2)))
    
    # for replacement of cases framework
    if (abs(obs_r) > abs(critical_r)) {
        action <- "to_invalidate"
        inference <- "reject_null"
        pct_bias <- 100 * (1 - (critical_r / obs_r)) }
    else if (abs(obs_r) < abs(critical_r)) {
        action <- "to_sustain"
        inference <- "fail_to_reject_null"
        pct_bias <- 100 * (1 - (obs_r / critical_r)) }
    else if (obs_r == critical_r) {
        action <- NA
        inference <- NA
        pct_bias <- NA
    }
    
    if ((abs(obs_r) > abs(critical_r)) & ((obs_r * critical_r) > 0)) {
        mp <- -1
    } else {
        mp <- 1
    }
    
    # for correlation based framework
    itcv <- (obs_r - critical_r) / (1 + mp * abs(critical_r))
    r_con <- round(sqrt(abs(itcv)), 3)
    
    out <- dplyr::data_frame(t, df, action, inference, pct_bias, itcv, r_con)
    names(out) <- c("t", "df", "action", "inference", "pct_bias_to_change_inference", "itcv", "r_con")
    
    out$pct_bias_to_change_inference <- round(out$pct_bias_to_change_inference, 3)
    out$itcv <- round(out$itcv, 3)
    out$action <- as.character(out$action)
    out$inference <- as.character(out$inference)
    
    return(out)
}

Let’s test it out. Imagine we have a t statistic of 3 for a hypothesis test associated with 100 degrees of freedom.

core_sensitivity_mkonfound(3, 100)
## # A tibble: 1 x 7
##       t    df        action   inference pct_bias_to_change_inference  itcv
##   <dbl> <dbl>         <chr>       <chr>                        <dbl> <dbl>
## 1     3   100 to_invalidate reject_null                       32.276 0.115
## # ... with 1 more variables: r_con <dbl>

Works.

It looks like in order to invalidate the inference, around 32% of the effect would be need to be due to bias; or, an committed variable would need to be correlated with both the predictor of interest and the dependent variable at .339 in order to invalidate the inference. You can read more about sensitivity analysis and an in-development R package on the approach to sensitivity analysis with Ran Xu and Ken Frank here.

The second version of the function

How could we write a function to provide output not only for one t and its associated df, but rather many values?

We can write a simple function to iterate through multiple values and to bind them together. The key is the map() function (from the tidyverse package purrr; if you are familiar with R, it is similar to many of the apply() functions). Specifically, because we:

We use map2_dfr(). Check out this helpful chapter of R for Data Science for more on iteration using approaches such as for and while loops as well as the useful apply()/ map() families of functions.

Here is what a function could look like:

library(purrr)

mkonfound <- function(t, df, alpha = .05, tails = 2) {
    map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}

Simple! But does it work? :)

Instead of passing a single t and df, as we did above with the core_sensitivity_monfound() function, we can pass vectors of t and df values:

mkonfound(t = c(3, 2, 2.5), 
          d = c(100, 200, 150))
## # A tibble: 3 x 7
##       t    df        action   inference pct_bias_to_change_inference  itcv
##   <dbl> <dbl>         <chr>       <chr>                        <dbl> <dbl>
## 1   3.0   100 to_invalidate reject_null                       32.276 0.115
## 2   2.0   200 to_invalidate reject_null                        1.378 0.002
## 3   2.5   150 to_invalidate reject_null                       20.364 0.048
## # ... with 1 more variables: r_con <dbl>

We could also do something like binding t and df together into a small data.frame:

d <- data.frame(t = c(3, 2, 2.5), 
                df = c(100, 200, 150))
d
##     t  df
## 1 3.0 100
## 2 2.0 200
## 3 2.5 150

And then mkonfound() could work like this:

mkonfound(d$t, d$df)
## # A tibble: 3 x 7
##       t    df        action   inference pct_bias_to_change_inference  itcv
##   <dbl> <dbl>         <chr>       <chr>                        <dbl> <dbl>
## 1   3.0   100 to_invalidate reject_null                       32.276 0.115
## 2   2.0   200 to_invalidate reject_null                        1.378 0.002
## 3   2.5   150 to_invalidate reject_null                       20.364 0.048
## # ... with 1 more variables: r_con <dbl>

The third version of the function

Still seems to work fine. Those of you familiar with the tidyverse may sense another possible improvement. Namely, the function could be written to both input and output a data.frame, and be a bit more intuitive to use via non-standard evaluation.

The goal is to add an additional argument for the data.frame (d), and then use non-standard evaluation to capture and then later evaluate in the context of the data.frame the names of the t and df columns

library(rlang)
## Warning: package 'rlang' was built under R version 3.4.3
library(dplyr)

mkonfound <- function(d, t, df, alpha = .05, tails = 2) {
    
    t_enquo <- enquo(t)
    df_enquo <- enquo(df)
    
    t = pull(select(d, !!t_enquo))
    df = pull(select(d, !!df_enquo))
    
    map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}

But does it work? Now, the first argument is the name of the data.frame, the second is the unquoted name of the column with the t statistics, and the third is the same as the second, but for the df associated with the t’s.

mkonfound(d, t, df)
## # A tibble: 3 x 7
##       t    df        action   inference pct_bias_to_change_inference  itcv
##   <dbl> <dbl>         <chr>       <chr>                        <dbl> <dbl>
## 1   3.0   100 to_invalidate reject_null                       32.276 0.115
## 2   2.0   200 to_invalidate reject_null                        1.378 0.002
## 3   2.5   150 to_invalidate reject_null                       20.364 0.048
## # ... with 1 more variables: r_con <dbl>

If we have an entire spreadsheet, read in R as a data.frame using the read.csv() (or, read_csv() from the very useful readr package) function, then we can easily compute output for all of the statistics in the spreadsheet. Here is a spreadsheet from a website (from Ken’s):

spreadsheet_of_vals <- read.csv("https://msu.edu/~kenfrank/example%20dataset%20for%20mkonfound.csv")
head(spreadsheet_of_vals)
##           t  df
## 1  7.076763 178
## 2  4.127893 193
## 3  1.893137  47
## 4 -4.166395 138
## 5 -1.187599  97
## 6  3.585478  87

We would use it the same way as above but with d replaced with what we named the data.frame we read from the website, spreadsheet_of_vals:

mkonfound(spreadsheet_of_vals, t, df)
## # A tibble: 30 x 7
##            t    df        action           inference
##        <dbl> <int>         <chr>               <chr>
##  1  7.076763   178 to_invalidate         reject_null
##  2  4.127893   193 to_invalidate         reject_null
##  3  1.893137    47    to_sustain fail_to_reject_null
##  4 -4.166395   138 to_invalidate         reject_null
##  5 -1.187599    97    to_sustain fail_to_reject_null
##  6  3.585478    87 to_invalidate         reject_null
##  7  0.281938   117    to_sustain fail_to_reject_null
##  8  2.549647    75 to_invalidate         reject_null
##  9 -4.436048   137 to_invalidate         reject_null
## 10 -2.045373   195 to_invalidate         reject_null
## # ... with 20 more rows, and 3 more variables:
## #   pct_bias_to_change_inference <dbl>, itcv <dbl>, r_con <dbl>

Since the output is in a data.frame, we can, for example, easily plot output:

results_df <- mkonfound(spreadsheet_of_vals, t, df)
library(ggplot2)

results_df$action <- dplyr::case_when(
    results_df$action == "to_invalidate" ~ "To Invalidate",
    results_df$action == "to_sustain" ~ "To Sustain"
)

ggplot(results_df, aes(x = pct_bias_to_change_inference, fill = action)) +
    geom_histogram() +
    scale_fill_manual("", values = c("#1F78B4", "#A6CEE3")) +
    theme_bw() +
    ggtitle("Histogram of Percent Bias") +
    facet_grid(~ action) +
    theme(legend.position = "none") +
    ylab("Count") +
    xlab("Percent Bias")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Like many functions in R, this could be written many different ways, and this post shows just one approach to writing a function.

In some cases, non-standard evaluation makes the function a bit harder to use - particularly in cases in which we are interested in the output from only a single study.

In that case, we would want to go back to the function we initially wrote (core_sensitivity_mkonfound()) or would have to write something a bit like:

single_study <- data.frame(t = 3, df = 100)
mkonfound(single_study, t, df)
## # A tibble: 1 x 7
##       t    df        action   inference pct_bias_to_change_inference  itcv
##   <dbl> <dbl>         <chr>       <chr>                        <dbl> <dbl>
## 1     3   100 to_invalidate reject_null                       32.276 0.115
## # ... with 1 more variables: r_con <dbl>

So, this is one approach that is useful in one use - for the in-development package for sensitivity analysis with a number of functions, with a version of this mkonfound() function for meta-analyses that make use of the approach.

Oh, and if you are interested in sensitivity analysis, please check out the konfound package this is for here.