## Linear Regression Example

You can perform (linear) regression analysis using Igor's curve fitting operations.. StatsLinearRegression provides more statistical information/tests for single and multiple regressions.

**1. Simple linear regression.**

Start by creating a wave with a known slope and additive Gaussian noise.

Make/O/N=100 data1=x+gnoise(4)

The simple linear regression analysis is obtained by:

StatsLinearRegression /T=1/Q data1

The results appear in the Linear Regression table (shown transposed):

N | 100 |

a | -0.912472 |

b | 1.01654 |

xBar | 49.5 |

yBar | 49.4064 |

sumx2 | 83325 |

sumy2 | 87529 |

sumxy | 84703.4 |

Syx | 3.81255 |

F | 5923.74 |

Fc | 3.93811 |

r2 | 0.983726 |

Sb | 0.0132077 |

t | |

tc | 1.98447 |

L1 | 0.990332 |

L2 | 1.04275 |

When a slope is not specified the null hypothesis is that the slope is zero and the t-statistic is not relevant. The regression results above clearly reject this hypothesis with regression line given by:

y=-0.912472+1.01654*x.

To plot the regression and the data curves execute the commands:

Duplicate/O data1,regression1

regression1=-0.912472+1.01654*x

Display/K=1 data1,regression1

ModifyGraph lsize(regression1)=2,rgb(regression1)=(0,0,0)

Here the red trace corresponds to our input data and the black trace is the regression line.

You can create confidence interval waves and prediction interval waves using

StatsLinearRegression /T=1/Q/BCIW/BPIW data1

You can now add the two bands to the graph:

AppendToGraph data1_CH,data1_CL,data1_PH,data1_PL

ModifyGraph rgb(data1_CH)=(0,65535,0),rgb(data1_CL)=(0,65535,0)

ModifyGraph rgb(data1_PH)=(0,0,65535),rgb(data1_PL)=(0,0,65535)

Here the green traces correspond to the confidence interval while the blue traces correspond to the prediction interval (default single prediction).

**2. Zero slope hypothesis**

To see the results of the zero-slope hypothesis test on different data set we generate data that have zero slope and random noise:

Make/O/N=100 data2=10+gnoise(4)

A graph pf data2 is shown above. To run the test execute the command:

StatsLinearRegression /T=1/Q data2

N | 100 |

a | 9.44344 |

b | 0.00449769 |

xBar | 49.5 |

yBar | 9.66608 |

sumx2 | 83325 |

sumy2 | 1505.25 |

sumxy | 374.77 |

Syx | 3.91695 |

F | 0.109865 |

Fc | 3.93811 |

r2 | 0.00111981 |

Sb | 0.0135694 |

t | |

tc | 1.98447 |

L1 | -0.0224303 |

L2 | 0.0314257 |

As expected, F<Fc and the hypothesis of zero slope must be accepted.

**3. Specific slope hypothesis**

Testing the hypothesis that the slope b=beta0. Using the wave data1 which was generated with a unit slope we have:

StatsLinearRegression /T=1/Q/B=1.0 data1

N | 100 |

a | -0.912472 |

b | 1.01654 |

xBar | 49.5 |

yBar | 49.4064 |

sumx2 | 83325 |

sumy2 | 87529 |

sumxy | 84703.4 |

Syx | 3.81255 |

F | 5923.74 |

Fc | 3.93811 |

r2 | 0.983726 |

Sb | 0.0132077 |

t | 1.25247 |

tc | 1.98447 |

L1 | 0.990332 |

L2 | 1.04275 |

It should be obvious from the definitions of L1 and L2 that any value of beta0 outside the range [L1,L2] will be rejected.

**4. Linear regression for more than one wave.**

Make/O/N=100 data3=4+x+gnoise(4)

Make/O/N=100 data4=5+x+gnoise(5)

You can run the linear regression test on multiple samples using the command:

StatsLinearRegression /T=1/Q data1,data3,data4

The results are displayed in the Linear Regression table and in the Linear Regression MC table.

Linear Regression table:

data | data3 | data4 | ||

N | 100 | 100 | 100 | |

a | -0.912472 | 4.10709 | 4.37794 | |

b | 1.01654 | 0.993232 | 0.993211 | |

xBar | 49.5 | 49.5 | 49.5 | |

yBar | 49.4064 | 53.272 | 53.5419 | |

sumx2 | 83325 | 83325 | 83325 | |

sumy2 | 87529 | 83725. | 85298.4 | |

sumxy | 84703.4 | 8276 | 82759.3 | |

Syx | 3.81255 | 3.94368 | 5.62508 | |

F | 5923.74 | 5285.34 | 2597.77 | |

Fc | 3.9381 | 3.9381 | 3.93811 | |

r2 | 0.983726 | 0.981796 | 0.963647 | |

Sb | 0.0132077 | 0.013662 | 0.0194868 | |

t | ||||

tc | 1.98447 | 1.98447 | 1.98447 | |

L1 | 0.990332 | 0.96612 | 0.95454 | |

L2 | 1.04275 | 1.02034 | 1.03188 |

Linear Regression MC table:

Ac | 249975 |

Bc | 250224 |

Cc | 256552 |

SSp | 6049.51 |

SSc | 6079.72 |

SSt | 7150.35 |

DFp | 294 |

DFc | 296 |

DFt | 298 |

Slopes F | 0.734116 |

Slopes Fc | 3.02647 |

CoincidentalRegression F | 13.375 |

CoincidentalRegression Fc | 2.40235 |

Elevations F | 26.0627 |

Elevations Fc | 3.02626 |

In this case the slopes' test results in F<Fc so the hypothesis of equal slopes is accepted while elevation's test results F>Fc implies that the equal elevations hypothesis is rejected as is the possibility of coincidental regression.

You can also test waves of unequal lengths as in the following example:

Make/O/N=150 data5=x+gnoise(4)

Make/O/N=100 data6=x+gnoise(5)

StatsLinearRegression /T=1/Q data1,data5,data6

Ac | 447888 |

Bc | 449403 |

Cc | 457496 |

SSp | 6549.12 |

SSc | 6571.72 |

SSt | 6614.3 |

DFp | 344 |

DFc | 346 |

DFt | 348 |

Slopes F | 0.593541 |

Slopes Fc | 3.02197 |

CoincidentalRegression F | 0.855888 |

CoincidentalRegression Fc | 2.3979 |

Elevations F | 1.12087 |

Elevations Fc | 3.02182 |

In this case both the slopes and the elevations are determined to be the same. The single test for coincidental regression would also indicate that the three waves have coincident regressions.

**5. Dunnett's multi-comparison test for elevations.**

You can perform Dunnett's multi-comparison test on the elevations of the several waves. In this example we define the first input wave as the control sample (/DET=0):

StatsLinearRegression /T=1/Q/DET=0 data1,data3,data4,data5,data6

The operation computes the linear regression and the general multi-comparison as described above. In addition it displays Dunnett's MC Elevations table for tests of each input wave against the control wave:

Pair | SE | q | qp | Conclusion |

1_vs_0 | 0.641955 | 6.02176 | 2.16539 | 0 |

2_vs_0 | 0.641955 | 6.44208 | 2.16539 | 0 |

3_vs_0 | 0.615424 | 0.127039 | 2.16539 | 1 |

4_vs_0 | 0.641955 | 1.23037 | 2.16539 | 1 |

The table indicates that the equal elevation hypothesis is rejected for the pairs data3-data1 and data4-data1. The hypothesis is accepted for the other combinations.

**6. The Tukey test for multiple regressions**

The test depends on the F tests for the slopes. If the slopes are determined to be the same then the Tukey test compares elevations. Otherwise, the Tukey test performs a multi-comparison of slopes.

StatsLinearRegression /T=1/Q/TUK data1,data3,data4,data5,data6

Since the slopes were determined to be the same, the Tukey test was performed on the elevations and the results are displayed in the Tukey MC Elevations table:

Pair | SE | q | qc | Conclusion |

4_vs_0 | 0.45393 | 1.7400 | 3.87073 | 1 |

4_vs_ | 0.45393 | 6.77604 | 3.87073 | 0 |

4_vs_2 | 0.45393 | 7.37047 | 3.87073 | 0 |

4_vs_3 | 0.43517 | 1.63536 | 3.87073 | 1 |

3_vs_0 | 0.43517 | 0.17966 | 3.87073 | 1 |

3_vs_ | 0.43517 | 8.7035 | 3.87073 | 0 |

3_vs_2 | 0.43517 | 9.32356 | 3.87073 | 0 |

2_vs_0 | 0.45393 | 9.11048 | 3.87073 | 0 |

2_vs_ | 0.45393 | 0.594429 | 3.87073 | 1 |

1_vs_0 | 0.45393 | 8.51605 | 3.87073 | 0 |

As expected input waves 1 and 2 do not match the elevations of the other waves.

**7. Regression analysis with multiple Y values for each X value**

When the data has multiple Y values for each X value you need to store the input as a 2D wave there each column represents one set of Y values. In this example the wave dataMYV consists of 30 rows and 6 columns so there are at most 6 Y values for each X. There are 6 NaNs representing missing values or padding used to format the input for the operation in case the number of Y-values is not the same for all x values. The X-values are implied by the wave scaling of the rows with start=1 and delta=3.

StatsLinearRegression /T=1/Q/MYVW={*,dataMYV}

The results are displayed in the "Linear Regression" table (shown transposed):

N | |

174 | |

a | -0.564705 |

b | 2.01551 |

xBar | 44.2586 |

yBar | 88.639 |

sumx2 | 117111 |

sumy2 | 485408 |

sumxy | 236039 |

amongGroupsSS | 477052 |

amongGroupsDF | 29 |

withinGroupsSS | 8355.47 |

withinGroupsDF | 144 |

devLinearitySS | 1312.81 |

devLinearityDF | 28 |

F | 0.808045 |

Fc | 1.55463 |

F2 | 8463.47 |

F2c | 3.89609 |

r2 | 0.980082 |

Syx | 7.4974 |

The F value is smaller than the critical Fc which implies that the regression of the data is indeed linear. The value of F2 is much greater than the critical value F2c which implies that the hypothesis of slope b=0 has to be rejected.