In the last Blog, we learned how to determine which effects are statistically significant. This is an important step to develop the predictive model(s) because only the statistically significant factors and interactions belong in the model. If we include insignificant terms in the model, the predictive ability of the model will appear to be better than it really is and we will overstate the ability of our model to predict the response(s).
In this Blog, we focus on the development of first order models. These kinds of models are easy to develop when running screening experiments where the factors are set at 2 levels (for efficiency). The first order models can contain terms for main effects and interaction effects.
A simple first order model (with main effect only) has the well-known form shown below.
This example does not have any interaction term and with only one factor, the model is the equation of a straight line. The question is: how can we estimate the unknown parameters (b1 and b0 terms: the slope and the intercept)? In two-level designs this is extremely easy as we leverage a simple coding system for the low and high levels of each factor. The more complex calculations to find regression model coefficients are not necessary in this case.
Recall from 7th grade algebra that the slope is the change in response (y) for a one unit increase in the factor level (x). Hopefully, this reminds you of main effects which is the change in the response (y) as the factor level (x) moves from low to high. If we define the low level as “-1” and the high level as “+1” (and the middle as “0”), then the distance between low and high is two units. So, to derive the slope, we simply take our effect and divide by two! This is illustrated in the graphic below.
Each coefficient (for all of the significant main effects and interaction effects) is calculated the same way.
What about the intercept term? Well the y-intercept in the simple linear model is the value of y when x=0. So, in our coded system this is the predicted value of y when x is halfway between the low and high value. Since we are enforcing linear relationships here, this is simply the average response! In the example below the average of 75 and 25 is 50.
How many terms might our model have? It depends on the number of factors and interactions that could be significant. Here are some examples. Note that the number of terms shown is the maximum possible (as only statistically significant terms will be included)
To summarize, the model coefficients are calculated as follows:
We finish this blog with an example of writing out the complete model, given the significant effects. Suppose an experiment is designed to understand camera battery life and the factors that may affect it. The response is a quantitative measure of battery life. The factors are:
- X1 – Wall thickness
- X2 – Cover strength
- X3 – Material Type
- X4 – Ambient Temperature
The effects for each of the effect columns in our matrix is shown below. The effect columns that are statistically significant are shown and highlighted in green below. The average response is also shown at the bottom of the response column (last column). Note that the effects are simply the average response for the highs (+) minus the average response for the lows (-) for that factor.
The predictive model is then:
In summary, we have learned how to write out the predictive model for 2-level studies (once the significant effects have been determined).
In the next blog, we will focus on using our model to find solutions to our problem.