1 Analyzing experimental data with a linear model

1.1 This text is about using linear models to estimate treatment effects and the uncertainty in our estimates. This, raises the question, what is “an effect”?

At the end of this text, I provide an example set of analyses for multiple, related experiments. This example is a goal or target; it’s what you will be working towards as you learn from this text. The data for the analysis come from multiple experiments presented in Figure 2 in the article ASK1 inhibits browning of white adipose tissue in obesity. The chapter preceding the analysis is just enough biology to help you understand the biological importance of each experiment. The data for Figure 2 is from a set of experiments exploring the consequences of adipose-tissue specific deletion of the ASK1 signaling protein on multiple, adverse effects of a high-fat diet in mice, including weight gain, glucose intolerance, and increased liver triacylglycerol levels. I chose the data in Figure 2 of this paper because of the diversity of analyses and plot types. My analyses and plots differ slightly from those of the researchers because I implemented better practices – the stuff of this text.

Here, I use one panel from Figure 2 to outline what the statistical analysis of experimental data is all about. Much of this outline will be repeated in “An introduction to linear models” chapter.

For the experiment in Figure 2i, the researchers want to know if knocking out the ASK1 gene in adipose tissue cells lowers the liver triglyceride (TG) level in mice fed a high-fat diet. To investigate this, the researchers measured the TG levels in a group of knockout mice (“ASK1Δadipo” – Δ is the del operator and refers to a deletion in genetics) and in a group of control mice (“ASK1F/F”).The goal of the experiment is to estimate the effect of the adipose-specific ASK1 deletion on TG level.

To understand what I mean by “an effect”, and to understand how we can estimate an effect by fiting a linear model to data, let’s frist peek at the results of the fit linear model.

Figure 1.1: The effect of adipose-specific ASK1 knockout on liver triglyceride (TG).

The estimate of the effect of adipose-specific ASK1 deletion on liver TG levels is the difference in the mean TG of the control group from the mean TG of the knockout group.

\[ \overline{\texttt{TG}}_\texttt{ASK1Δadipo} - \overline{\texttt{TG}}_\texttt{ASK1F/F} \]

This estimate is -21.6 µmol per g liver. An effect has the same units as the variable compared (µmol per g liver), a magnitude (21.6 units), and a direction (negative).

The direction and magnitude of the estimated effect can be mentally reconstructed by comparing the position of the two group means in the lower part of Figure 1.1. The upper part of Figure 1.1 explicitly shows the estimate and shows the the uncertainty in the estimate using 95% confidence intervals. I show how to make these plots in Chapter 4.

The values of the estimated effect and its 95% confidence interval in the upper part come from the contrasts table.

Table 1.1: Contrast table for the experiment in Figure 2i.

 contrast               estimate   SE df lower.CL upper.CL t.ratio p.value
 ASK1Δadipo - (ASK1F/F)    -21.6 7.05 10    -37.3     -5.9  -3.066  0.0119

Confidence level used: 0.95

The first column of the table gives the direction of the contrast between the two treatment levels. The value in the “estimate” column is the estimated effect (the measured difference in means). The value in the “SE” column is the standard error of the difference in means (SE), a measure of uncertainty of the estimate. We use the SE to compute the test statistic for a t-test (“t.ratio” column), from which we get the p-value (“p.value” column), the probability of observing a t-value as large or larger than our observed value, if the null were true. The “null is true” not only means the true effect is zero but also assumes a long list of conditions, such as normality, homogenity, and independence (Chapter 8).

We also use the SE to compute the 95% confidence intervals of the estimated effect, whose lower and upper limits are in the “lower.CL” and “upper.CL” columns. The lower bound of the CI of the effect is -37.3 µmol/g and the upper bound is -5.9 µmol/g. A confidence interval is another way of communicating uncertainty (in addition to the SE), and the way advocated in this text. In a 95% confidence interval, 95% of similarly constructed intervals (from hypothetical sampling of six mice from the ASK1 normal population and six mice from the ASK1 knockout population) will contain the true mean. Another way to think about a confidence interval is, it is the range of true differences that are compatible with the data, where compatible means “not rejected” by a t-test (a t-test between the estimated effect and any number inside the interval would return a p-value greater than 0.05).

Understanding what a standard error is and how to interpret confidence intervals is paramount to practicing good statistics and is covered in Chapter 5.

The modeled means and error intervals in the lower part of Figure 1.1 come from the table of estimated marginal means, which I call the modeled means table.

Table 1.2: Modeled means of the two treatment levels in the data from the experiment in Figure 2i.

 treatment  emmean   SE df lower.CL upper.CL
 ASK1F/F      61.5 4.98 10     50.4     72.6
 ASK1Δadipo   39.9 4.98 10     28.8     51.0

Confidence level used: 0.95

The value in the “emmean” column is the modeled mean of the treatment level given in the column “treatment”. The value in the “SE” column is the standard error of the mean, a measure of uncertainty of the estimate. We use the SE to compute the 95% confidence interval of the mean. The lower and upper limits of this interval are in the columns “lower.CL” and “upper.CL”.

1.2 More on uncertainty

Repeat after me: estimated effects

It is hard to overemphasize that what we measure in experiments are estimated effects. The true effect may be larger, smaller, or even in the reverse direction. This text is all about what we can infer about true effects from the statistical analysis of experimental data. This inference requires that we also measure our uncertainty in our estimate of the effect.

The measured means in each group are computed from a random sample of mice. If we only cared about the six mice in each group in this experiment, then we would not need to fit a linear model to the data to estimate the effect, we could simply compute each group’s mean and subtract the control mean from the knockout mean. But we care more about something more than these dozen mice because we are trying to discover something general about ASK1 regulation of TG levels in mice, generally (and even in mammals, and especially humans, generally). To make this leap of inference, we use a model to claim that each sample mean is an estimate of the respective population mean. Given this model, we can compute the standard error of each mean and the standard error of the difference in means. A standard error is a measure of the sampling variance of a statistic and, therefore, a measure of the precision of the estimate. The standard error, then, is a measure of uncertainty in the estimate. Here is how to think about precision and uncertainty: if we were to repeat this experiment many, many times, we would generate a long list of mean TG levels for the control mice and a long list of mean TG levels for the knockout mice. The less variable the means in each list, the more precise. By using a model, we do not need to repeat this experiment many times to get a standard error.

Better Know: Population

What is a population? In the experimental biology examples in this text, we might consider the population as a very idealized, infinitely large set of mice, or fish, or fruit flies, or communities from which our sample is a reasonably representative subset. For the experiments in the ASK1Δadipo study, the population might be conceived of as the hypothetical, infinitely large set of floxed ASK1 C57BL/6 mice bred in the mouse facility of the researchers under the same rearing conditions. By infinitely large, I mean something like all possible phenotypes that could be born from the nearly-infinite combination of meiosis, gene-gene interactions, genotype-environment interactions, and maternal effects.

1.3 A wee introduction to fitting a model to data

The model we are going to fit to the Figure 2i data is

\[\begin{align} y &= \beta_0 + \beta_1 x + \varepsilon\\ \varepsilon &\sim N(0, \sigma^2) \end{align}\]

This is a model of how the Figure 2i data were generated. In this model, \(y\) is the liver TG level for some fictional, randomly generated mouse and \(x\) is a variable that indicates the condition of the ask1 gene in randomly generated mouse – a value of 0 is given to mice with a functional ASK1 gene and a value of 1 is given to mice with a knocked out gene.

\(\beta_0\) is the “true” mean of TG in mice fed a high-fat diet and with a functional ASK1 gene. By “true”, I mean the mean that would be computed if we were to measure TG on an infinite number of these mice (exactly what “these mice” means is a good topic for a campfire discussion). The observed mean of the ASK1F/F group is an estimate of \(\beta_0\). The sum \(\beta_0\) + \(\beta_1\) is the true mean of TG in mice fed a high-fat diet but with a knocked out ASK1 gene. This means that \(\beta_1\) is the true difference in the means, or the true effect. The observed difference in means between the ASK1Δadipo and ASK1F/F groups is an estimate of \(\beta_1\). This difference is the estimated effect.

The sum \(\beta_0 + \beta_1 x\) is the expectation of TG, or expected value of TG in a generated mouse given the generating model. This sum equals the true mean of the infinite set of normal mice if \(x = 0\) and equals the true mean of the infinite set of ASK1 knockout mice if \(x = 1\). All generated control mice have the same expected value of TG. All generated knockout mice have the same expected value of TG.

\(\varepsilon\) is the error for the randomly generated mouse. It is a random number sampled from a Normal distribution with a mean of zero and a variance of \(\sigma^2\). The variation in generated mice has a systematic component due to variation in \(x\) and a random (or stochastic) component due to variation in \(\varepsilon\).

By fitting a model to the data we estimate the parameters \(\beta_0\), \(\beta_1\) and \(\sigma\). It is the estimation of \(\sigma\) that allows us to compute a measure of our uncertainty (a standard error) of our estimates of the means (\(\beta_0\) and \(\beta_0 + \beta_1\)) and of the difference in the means (\(\beta_1\)).

Let’s fit this model to the Figure 2i data using R.

fig2i_m1 <- lm(liver_tg ~ treatment, data = fig2i)

Robust inference from the model (generalizing from sample to population, including measures of the uncertainty of our estimates, requires that our data approximates the kind of data we’d expect from the data generating model specified above. All rigorous analysis should use specific model checks to evaluate a model fit. We use a quantile-quantile (QQ) plot to see if our data approximate what we’d see if we sampled from a normal distribution and a spread-level plot to see if our data approximate what we’d see if we sampled from distributions that do not differ in their variance.

ggcheck_the_model(fig2i_m1)

Our data look like they were sampled from a normal distribution because the points lie within the shaded boundary of the QQ plot (left panel). Our data look like they were sampled from distributions with the same variance because of the small tilt of the blue line in the Spread-Level plot (right panel). How to read these plots is covered in Chapter 11.

Given these checks, lets move on and look at the table of model coefficients.

fig2i_m1_coef <- cbind(coef(summary(fig2i_m1)),
                        confint(fig2i_m1))
fig2i_m1_coef

                     Estimate Std. Error   t value     Pr(>|t|)     2.5 %
(Intercept)          61.46667   4.981912 12.337968 2.248902e-07  50.36628
treatmentASK1Δadipo -21.60000   7.045487 -3.065792 1.192608e-02 -37.29832
                       97.5 %
(Intercept)         72.567058
treatmentASK1Δadipo -5.901676

The two values in the column “Estimate” are the estimates of \(\beta_0\) and \(\beta_1\).
The top value (61.467 µmol/g), the intercept of the model (\(\beta_0\)), is the mean of the wildtype (“ASK1F/F”) mice!
The second value (-21.6 µmol/g), the slope of the model (\(\beta_1\)) is the effect of ASK1 deletion on TG levels!
Notice the redundancy between the 2nd row of the coefficient table and the contrast table.

Here is how we might report this result in a paper:

“Mean TG level in ASK1Δadipo mice on a high-fat diet was 21.6 µmol/g less than that in ASK1F/F mice on a high-fat diet (95% CI: -37.3, -5.9, \(p = 0.012\)) (Figure 1.1).”

Phenomenal cosmic power

Remember that \(\beta_0 + \beta_1 x\) is the expecation given (or conditional on) x.

For the reference group, which is the ASK1F/F genotype, \(x = 0\), so the estimated mean TG level is \(61.46667 + -21.60000 \cdot 0\), which is 61.46667. Compare this value to the modeled mean in Table 1.2.
For the non-reference group, which is the ASK1Δadipo genotype, \(x = 1\), so the estimated mean TG level is \(61.46667 + -21.60000 \cdot 1\), which is 39.86667. Compare this value to the modeled mean in Table 1.2.