Initial setup

rm(list=ls())
library(lme4)
library(effects)
library(ggplot2)
library(MASS)
compact = function(x,digits=2){return(format(round(x,digits),nsmall=digits))}
ts=25



SMDs with repeated measures / mixed-models

Let’s consider this simple illustrative case with N = 16 divided into two conditions, with each subject providing k = 10 observed responses, all different. With a mixed-model we could model responses like this: fit = lmer(score ~ Condition + (1|id), data = df).

If we should report a standardized effect size (i.e., SMD, Cohen’s d), how do we compute it?

If we consider all observed responses (small circles, which include both within- and between-subject variance) we get: Cohen’s d = 0.61

However, if we consider only the (model-estimated, unobserved) true subject scores (diamonds, which include only between-subject variance) we get: Cohen’s d = 1.80

How do we estimate the second Cohen’s d using ONLY the model parameters in the summary? Let’s see the summary:

summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ Condition + (1 | id)
##    Data: df
## 
## REML criterion at convergence: 592.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.15519 -0.65057  0.02104  0.78849  2.24219 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 0.4295   0.6554  
##  Residual             2.1316   1.4600  
## Number of obs: 160, groups:  id, 16
## 
## Fixed effects:
##                 Estimate Std. Error t value
## (Intercept)      -0.6188     0.2834  -2.183
## Conditioncond.2   0.9640     0.4008   2.405
## 
## Correlation of Fixed Effects:
##             (Intr)
## Condtncnd.2 -0.707

We may divide the “raw coefficient” by either the total SD, or only by the estimated between-subject (id) SD:

→ For total variance/SD: 0.96 / sqrt(0.662 + 1.462) = 0.60

→ For estimated between-subject SD only: 0.96 / 0.66 = 1.47




GLM: Poisson (with Gamma it’s more or less the same)

Let’s model a discrete count variable such as reading errors. Age is the predictor. Reading errors monotonously decrease throughout primary school. However, decrease is not linear: it smoothly converges towards zero, and M and SD are related.

We had already talked about the problem here: https://www.memoryandlearninglab.it/wp-content/uploads/2023/10/glm_e_overdispersion3.html

The mean linear decrease is about --0.50 errors per year, so every two years we should observe a decrease of about -1.00 reading errors.

However… from 6 to 7 years we get an expected decrease of -1.37 reading errors, whereas from 7 to 8 we get an expected decrease of -0.75 reading errors.

So, what remain constant? What can be reported as a meaningful effect size? It’s the percentage of reduction of reading errors per time unit.

Every +1 year, the expected number of remaining reading errors is 55% compared to the expected number of the previous year. After +2 years, the expected number of remaining reading errors is 30% compared to the previous observation. These decreases are constant over time.

How do we get these estimates? Let’s have a look at the Poisson model summary:

summary(fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: errors ~ age + (1 | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1160.3   1173.0   -577.2   1154.3      497 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4797 -0.6413 -0.4043  0.5184  3.7210 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.05813  0.2411  
## Number of obs: 500, groups:  id, 500
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.72878    0.31624   14.95   <2e-16 ***
## age         -0.60363    0.04138  -14.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## age -0.984

The estimate for age is -0.60.

The effect on the linear scale per +1 year is exp(age) that is 0.55, so the percentage is 55% (i.e., remaining percentage of reading errors after every year).

Per +2 years we calculate exp(age*2), that is 0.30, thus a remaining percentage of 30%.




Binomial regression

Let’s model a discrete sum score variable measuring solved math problems out of 15 in a task. Accuracy ranges from 0% (sum score = 0) to 100% (sum score = 15). Age is the predictor. Accuracy monotonously increase throughout primary school. However, increase is not linear: it is constrained between two extreme bounds, and “fastest” in the middle, and once again M and SD are related.

The linear increase per year is about +3.71 correctly solved math problems… this is not bad, but clearly inaccurate when close to the bounds.

A better estimate is the Odds Ratio, which here is 11.50. This is an appropriate effect size index for binomial regressions. But how is it interpreted?

Every +1 year of age, the odds of correctly solving a problem is 11.50 times the odds of the year before.

How do we get this estimate? Let’s have a look at the Binomial model summary:

summary(fit)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(sumscore, 15 - sumscore) ~ age + (1 | id)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1712.5   1725.1   -853.2   1706.5      497 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.08730 -0.34507  0.07611  0.30624  1.01348 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 2.787    1.669   
## Number of obs: 500, groups:  id, 500
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -20.10317    0.81904  -24.55   <2e-16 ***
## age           2.44249    0.09836   24.83   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## age -0.993

The estimate for age is 2.44.

The Odds Ratio is exp(age) that is 11.50.

Note that the Odds Ratio depends on the “age” metrics, which is expressed in years: with age in months, we would get a different estimate. Specifically, it would be 11.50^(1/12) = 1.23