Interaction Effect
An interaction effect in user experiments or statistical analysis refers to a situation where the impact of one variable on an outcome depends on the level of another variable. Related: Independence. Interaction can be examined in different types of models, such as in regression analysis or analysis of variance (ANOVA), but the basic idea is the same.
In the case of two-way interaction (interaction between two independent variables), let's denote our variables as follows:
- $X$ and $Y$ are your independent variables.
- $Z$ is your dependent variable.
- $XY$ represents the interaction between $X$ and $Y$.
In a regression model that includes an interaction term, the model would look like this:
Z=ฮฒ0โ+ฮฒ1โX+ฮฒ2โY+ฮฒ3โXY+ฯตHere, $\beta_0$ is the intercept, $\beta_1$ and $\beta_2$ are the main effects of $X$ and $Y$ respectively, and $\beta_3$ represents the interaction effect of $X$ and $Y$ on $Z$. $\epsilon$ is the error term.
To calculate the interaction effect, you need to estimate the regression coefficients $\beta_0, \beta_1, \beta_2,$ and $\beta_3$. This is typically done through a method called Ordinary Least Squares (OLS) regression, which minimizes the sum of the squared residuals. In a factorial ANOVA setting, you would calculate the interaction effect as the difference between the effect of one factor at different levels of the other factor.
Calculating $\beta$
Linear Regression
In the regression model:
Z=ฮฒ0โ+ฮฒ1โX+ฮฒ2โY+ฮฒ3โXY+ฯตThe coefficients $\beta_0$, $\beta_1$, $\beta_2,$ and $\beta_3$ are typically estimated using the method of Ordinary Least Squares (OLS). OLS minimizes the sum of the squared residuals (the differences between the observed and predicted values of the dependent variable $Z$). In simple linear regression (only one predictor), the formulas to estimate the coefficients are:
ฮฒ1โ=โi=1nโ(XiโโXห)2โi=1nโ(XiโโXห)(ZiโโZห)โ ฮฒ0โ=Zหโฮฒ1โXหwhere
- $X_i$ and $Z_i$ are the individual observations,
- $\bar{X}$ and $\bar{Z}$ are the means of $X$ and $Z$ respectively,
- $n$ is the number of observations.
For multiple predictors and interaction terms, we typically use matrix notation and some linear algebra to solve a system of linear equations to estimate the coefficients. This process requires several assumptions to be valid, including linearity, independence, homoscedasticity, and normally distributed errors. If these assumptions are violated, other methods might be more appropriate to estimate the coefficients.
Multiple Linear Regression
For multiple linear regression (which includes multiple predictors and interaction terms), as in the case of our model:
Z=ฮฒ0โ+ฮฒ1โX+ฮฒ2โY+ฮฒ3โXY+ฯตThe calculation of coefficients $\beta_0, \beta_1, \beta_2,$ and $\beta_3$ becomes more complex. The formula that generalizes the one for simple linear regression involves matrix operations.
If we denote:
- $X$ as a matrix that includes a column of ones (for the intercept), and the values of the predictor variables (and their products for interaction terms),
- $Y$ as a column vector of the outcome variable,
- $B$ as a column vector of the coefficients to be estimated,
Then the formula for the least squares estimates in multiple regression is:
B=(XโฒX)โ1XโฒYwhere $X'$ denotes the transpose of $X$ and $(X'X)^{-1}$ denotes the inverse of $X'X$.
As in the simple regression case, these estimates are based on minimizing the sum of the squared residuals (i.e., differences between observed and predicted values of the outcome variable), and the validity of the estimates depends on several assumptions, including linearity, independence, homoscedasticity (constant variance of errors), and normally distributed errors.