```
rm(list=ls())
library(lme4)
library(effects)
library(ggplot2)
library(MASS)
compact = function(x,digits=2){return(format(round(x,digits),nsmall=digits))}
ts=25
```

Let’s consider this simple illustrative case with N = 16 divided into
two conditions, with each subject providing k = 10 observed responses,
all different. With a mixed-model we could model responses like this:
** fit = lmer(score ~ Condition + (1|id), data = df)**.

If we should report a standardized effect size (i.e., SMD, Cohen’s d), how do we compute it?

If we consider all observed responses (small circles, which include
both within- and between-subject variance) we get: **Cohen’s d =
0.61**

However, if we consider only the (model-estimated, unobserved) true
subject scores (diamonds, which include only between-subject variance)
we get: **Cohen’s d = 1.80**

How do we estimate the second Cohen’s d using ONLY the model parameters in the summary? Let’s see the summary:

`summary(fit)`

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ Condition + (1 | id)
## Data: df
##
## REML criterion at convergence: 592.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.15519 -0.65057 0.02104 0.78849 2.24219
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.4295 0.6554
## Residual 2.1316 1.4600
## Number of obs: 160, groups: id, 16
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.6188 0.2834 -2.183
## Conditioncond.2 0.9640 0.4008 2.405
##
## Correlation of Fixed Effects:
## (Intr)
## Condtncnd.2 -0.707
```

We may divide the “raw coefficient” by either the total SD, or only by the estimated between-subject (id) SD:

→ For total variance/SD: **0.96 / sqrt(0.66 ^{2} +
1.46^{2}) = 0.60**

→ For estimated between-subject SD only: **0.96 / 0.66 =
1.47**

Let’s model a discrete count variable such as reading errors. Age is the predictor. Reading errors monotonously decrease throughout primary school. However, decrease is not linear: it smoothly converges towards zero, and M and SD are related.

We had already talked about the problem here: **https://www.memoryandlearninglab.it/wp-content/uploads/2023/10/glm_e_overdispersion3.html**

The mean linear decrease is about -**-0.50** errors
*per year*, so **every two years we should observe a
decrease of about -1.00 reading errors**.

However… from **6** to **7** years we get
an expected decrease of **-1.37** reading errors, whereas
from **7** to **8** we get an expected
decrease of **-0.75** reading errors.

So, what remain constant? What can be reported as a meaningful effect
size? It’s the percentage of reduction of reading errors *per time
unit*.

**Every +1 year**, the expected number of remaining
reading errors is **55%** compared to the expected number
of the previous year. **After +2 years, the expected number of
remaining reading errors is 30% compared to the previous
observation**. These decreases are **constant over
time**.

How do we get these estimates? Let’s have a look at the Poisson model summary:

`summary(fit)`

```
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: errors ~ age + (1 | id)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 1160.3 1173.0 -577.2 1154.3 497
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4797 -0.6413 -0.4043 0.5184 3.7210
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.05813 0.2411
## Number of obs: 500, groups: id, 500
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.72878 0.31624 14.95 <2e-16 ***
## age -0.60363 0.04138 -14.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## age -0.984
```

The estimate for ** age** is

The effect on the linear scale **per +1 year** is
** exp(age)** that is

**Per +2 years** we calculate
** exp(age*2)**, that is

Let’s model a discrete sum score variable measuring solved math problems out of 15 in a task. Accuracy ranges from 0% (sum score = 0) to 100% (sum score = 15). Age is the predictor. Accuracy monotonously increase throughout primary school. However, increase is not linear: it is constrained between two extreme bounds, and “fastest” in the middle, and once again M and SD are related.

The linear increase *per year* is about +**3.71**
correctly solved math problems… this is not bad, but clearly inaccurate
when close to the bounds.

A better estimate is the ** Odds Ratio**, which
here is

Every +1 year of age, the **odds** of correctly solving
a problem is **11.50 times the odds of the year
before**.

How do we get this estimate? Let’s have a look at the Binomial model summary:

`summary(fit)`

```
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: cbind(sumscore, 15 - sumscore) ~ age + (1 | id)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 1712.5 1725.1 -853.2 1706.5 497
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.08730 -0.34507 0.07611 0.30624 1.01348
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 2.787 1.669
## Number of obs: 500, groups: id, 500
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -20.10317 0.81904 -24.55 <2e-16 ***
## age 2.44249 0.09836 24.83 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## age -0.993
```

The estimate for ** age** is

The ** Odds Ratio** is

`exp(age)`

**Note that the Odds Ratio depends on the “age”
metrics, which is expressed in years: with age in months, we would get a
different estimate. Specifically, it would be 11.50 ^(1/12)
= 1.23**