HomeTechniques and Tips@RISK Simulation: Numerical ResultsRegression Equation from Calculated Sensitivities

# 6.24. Regression Equation from Calculated Sensitivities

Applies to:
@RISK for Excel 4.x–7.x
@RISK for Project 4.1

I know that @RISK for Excel and for Project display regression sensitivities in a tornado diagram, and @RISK for Excel calculates them in the worksheet function RiskSensitivity. But how can I assemble them into a regression equation? What's the constant term? Is the regression equation more accurate for some input values than for others?

First, make sure you have the actual regression coefficients in units of output per unit of input.

Y = b0 + b1X1 + b2X2 + b3X3 + ...

In this equation, Y is the @RISK output. b0 is the constant term (see next paragraph). The other b's are the regression coefficients, descaled if necessary (see above), and the X's are the @RISK input variables.

What is the value of the constant term, b0? In @RISK 6.0 and newer, you can get this from RiskSensitivity( ) with a result type of 4. In earlier versions of @RISK, you have to calculate it. @RISK doesn't reveal this directly, but you can compute it from the other information. The line of best fit (the regression line) is guaranteed to include the point where all the inputs and the output have their mean values. Get those mean values from the Results Summary window or the Detailed Statistics window, and substitute in the regression equation to solve for the constant term:

b0 = Ybar - b1Xbar1 - b2Xbar2 - b3Xbar3 - ...

where Ybar is the mean value of the output and the Xbar's are the mean values of the inputs.

When you have the constant term, you have the last piece of the regression equation.

Where is this equation valid? The coefficients are global properties of the overall set of data, so the equation is valid through the entire region of these input values. That is, each regression coefficient refers to the line that fits best through all the points, weighted equally. The regression equation takes all points (iterations) of all variables equally into account.

What if data are skewed? Just as with a simple two-variable X-Y regression, that will affect the residuals. If one region of the cloud of points is markedly different from another, the regression equation does the best it can overall, which may mean less than the best for particular regions. In that case the residuals would be large in some regions and small in others.

One caveat: All of this assumes that you have captured all the inputs that have any meaningful impact on this output. If you have only some of the significant inputs, then of course the regression line will lose some of its effectiveness.