Skip to main content

Interaction Effect

An interaction effect in user experiments or statistical analysis refers to a situation where the impact of one variable on an outcome depends on the level of another variable. Related: Independence. Interaction can be examined in different types of models, such as in regression analysis or analysis of variance (ANOVA), but the basic idea is the same.

In the case of two-way interaction (interaction between two independent variables), let's denote our variables as follows:

  • XX and YY are your independent variables.
  • ZZ is your dependent variable.
  • XYXY represents the interaction between XX and YY.

In a regression model that includes an interaction term, the model would look like this:

Z=ฮฒ0+ฮฒ1X+ฮฒ2Y+ฮฒ3XY+ฯตZ = \beta_0 + \beta_1X + \beta_2Y + \beta_3XY + \epsilon

Here, ฮฒ0\beta_0 is the intercept, ฮฒ1\beta_1 and ฮฒ2\beta_2 are the main effects of XX and YY respectively, and ฮฒ3\beta_3 represents the interaction effect of XX and YY on ZZ. ฯต\epsilon is the error term.

To calculate the interaction effect, you need to estimate the regression coefficients ฮฒ0,ฮฒ1,ฮฒ2,\beta_0, \beta_1, \beta_2, and ฮฒ3\beta_3. This is typically done through a method called Ordinary Least Squares (OLS) regression, which minimizes the sum of the squared residuals. In a factorial ANOVA setting, you would calculate the interaction effect as the difference between the effect of one factor at different levels of the other factor.

Calculating ฮฒ\betaโ€‹

Linear Regressionโ€‹

In the regression model:

Z=ฮฒ0+ฮฒ1X+ฮฒ2Y+ฮฒ3XY+ฯตZ = \beta_0 + \beta_1X + \beta_2Y + \beta_3XY + \epsilon

The coefficients ฮฒ0\beta_0, ฮฒ1\beta_1, ฮฒ2,\beta_2, and ฮฒ3\beta_3 are typically estimated using the method of Ordinary Least Squares (OLS). OLS minimizes the sum of the squared residuals (the differences between the observed and predicted values of the dependent variable ZZ). In simple linear regression (only one predictor), the formulas to estimate the coefficients are:

ฮฒ1=โˆ‘i=1n(Xiโˆ’Xห‰)(Ziโˆ’Zห‰)โˆ‘i=1n(Xiโˆ’Xห‰)2\beta_1 = \frac{\sum_{i=1}^{n} (X_i-\bar{X})(Z_i-\bar{Z})}{\sum_{i=1}^{n} (X_i-\bar{X})^2} ฮฒ0=Zห‰โˆ’ฮฒ1Xห‰\beta_0 = \bar{Z} - \beta_1\bar{X}

where

  • XiX_i and ZiZ_i are the individual observations,
  • Xห‰\bar{X} and Zห‰\bar{Z} are the means of XX and ZZ respectively,
  • nn is the number of observations.

For multiple predictors and interaction terms, we typically use matrix notation and some linear algebra to solve a system of linear equations to estimate the coefficients. This process requires several assumptions to be valid, including linearity, independence, homoscedasticity, and normally distributed errors. If these assumptions are violated, other methods might be more appropriate to estimate the coefficients.

Multiple Linear Regressionโ€‹

For multiple linear regression (which includes multiple predictors and interaction terms), as in the case of our model:

Z=ฮฒ0+ฮฒ1X+ฮฒ2Y+ฮฒ3XY+ฯตZ = \beta_0 + \beta_1X + \beta_2Y + \beta_3XY + \epsilon

The calculation of coefficients ฮฒ0,ฮฒ1,ฮฒ2,\beta_0, \beta_1, \beta_2, and ฮฒ3\beta_3 becomes more complex. The formula that generalizes the one for simple linear regression involves matrix operations.

If we denote:

  • XX as a matrix that includes a column of ones (for the intercept), and the values of the predictor variables (and their products for interaction terms),
  • YY as a column vector of the outcome variable,
  • BB as a column vector of the coefficients to be estimated,

Then the formula for the least squares estimates in multiple regression is:

B=(Xโ€ฒX)โˆ’1Xโ€ฒYB = (X'X)^{-1}X'Y

where Xโ€ฒX' denotes the transpose of XX and (Xโ€ฒX)โˆ’1(X'X)^{-1} denotes the inverse of Xโ€ฒXX'X.

As in the simple regression case, these estimates are based on minimizing the sum of the squared residuals (i.e., differences between observed and predicted values of the outcome variable), and the validity of the estimates depends on several assumptions, including linearity, independence, homoscedasticity (constant variance of errors), and normally distributed errors.

Links to This Note