`library(splithalfr)`

This vignette describes the d-prime; a scoring method introduced by Miller (1996).

Load the included Go/No Go dataset and inspect its documentation.

```
data("ds_gng", package = "splithalfr")
?ds_gng
```

The columns used in this example are:

- condition, 0 = go, 2 = no go
- response. Correct (1) or incorrect (0)
- rt. Reaction time (seconds)
- participant. Participant ID

The variables `condition`

and `stim`

were counterbalanced. Below we illustrate this for the first participant.

```
ds_1 <- subset(ds_gng, participant == 1)
table(ds_1$condition, ds_1$stim)
```

The scoring function receives the data from a single participant. For the proportion of hits and false alarms, it calculates their quantiles given a standard normal distribution. Extreme values are adjusted for via the log-linear approach (Hautus, 1995).

```
fn_score <- function(ds) {
n_hit <- sum(ds$condition == 0 & ds$response == 1)
n_miss <- sum(ds$condition == 0 & ds$response == 0)
n_cr <- sum(ds$condition == 2 & ds$response == 1)
n_fa <- sum(ds$condition == 2 & ds$response == 0)
p_hit <- (n_hit + 0.5) / ((n_hit + 0.5) + n_miss + 1)
p_fa <- (n_fa + 0.5) / ((n_fa + 0.5) + n_cr + 1)
return (qnorm(p_hit) - qnorm(p_fa))
}
```

Let’s calculate the d-prime score for the participant with UserID 1.

`fn_score(subset(ds_gng, participant == 1))`

To calculate the d-prime score for each participant, we will use R’s native `by`

function and convert the result to a data frame.

```
scores <- by(
ds_gng,
ds_gng$participant,
fn_score
)
data.frame(
participant = names(scores),
score = as.vector(scores)
)
```

To calculate split-half scores for each participant, use the function `by_split`

. The first three arguments of this function are the same as for `by`

. An additional set of arguments allow you to specify how to split the data and how often. In this vignette we will calculate scores of 1000 permutated splits. The trial properties `condition`

and `stim`

were counterbalanced in the Go/No Go design. We will stratify splits by these trial properties. See the vignette on splitting methods for more ways to split the data.

The `by_split`

function returns a data frame with the following columns:

`participant`

, which identifies participants`replication`

, which counts replications`score_1`

and`score_2`

, which are the scores calculated for each of the split datasets

*Calculating the split scores may take a while. By default, by_split uses all available CPU cores, but no progress bar is displayed. Setting ncores = 1 will display a progress bar, but processing will be slower.*

```
split_scores <- by_split(
ds_gng,
ds_gng$participant,
fn_score,
replications = 1000,
stratification = paste(ds_gng$condition, ds_gng$stim)
)
```

Next, the output of `by_split`

can be analyzed in order to estimate reliability. By default, functions are provided that calculate Spearman-Brown adjusted Pearson correlations (`spearman_brown`

), Flanagan-Rulon (`flanagan_rulon`

), Angoff-Feldt (`angoff_feldt`

), and Intraclass Correlation (`short_icc`

) coefficients. Each of these coefficient functions can be used with `split_coef`

to calculate the corresponding coefficients per split, which can then be plotted or averaged via a simple `mean`

. A bias-corrected and accelerated bootstrap confidence interval can be calculated via `split_ci`

. Note that estimating the confidence interval involves very intensive calculations, so it can take a long time to complete.

```
# Spearman-Brown adjusted Pearson correlations per replication
coefs <- split_coefs(split_scores, spearman_brown)
# Distribution of coefficients
hist(coefs)
# Mean of coefficients
mean(coefs)
# Confidence interval of coefficients
split_ci(split_scores, spearman_brown)
```