Cause and effect: Regression Analysis

Cause and effect

The correlation theory of statistics is used to model the cause and effect among variables. The causation implies correlation, but correlation along does not implies caustion. A correlation between two variables alone cannot prove that there is a causation effect. The correlation is only a mathematical relationship and it does not necessarily signify a cause and effect relationship between the variables. For any two correlated variables A and B, the following relationships are possible:

A causes B
B causes A
Both A and B are affected by a common cause (third variable) but do not cause each other
Both A and B are mutually affecting each other so than neither of them could be designated as a cause or effect
No connection between A and B, the correlation may be due to random or chance factors

The correlation is only a mathematical relationship and it does not necessarily signify a cause and effect relationship between the variables.

Simple Linear Regression

Regression Analysis deals in establishing statistical model of relationship of a variable with the explanatory variables. Following are the steps in this process.

Step 1 : Choose predictor or explanatory variables. The correlation analysis is done to choose the predictor model. The pair of two variables with higher coefficient of correlation indicates strong relation. The coefficient of correlation varies between -1.0 to 1.0

Many times, time is also one of the variables but it may just be a proxy for other variables. Time should be considered as predictor variable when there is a seasoning affect to be accounted.

Step 2: Regression Analysis -> confidence level & prediction
Define a regression model between the variables. The Simple Linear Regression model predict the variables by following equation

Yi = m Xi + c
[c= intercept, m = sensitivity of Xi]

The difference between the observed value and the predicted value from regression model is termed as residual.

residual = actual - predicted value of the variable

standard residual = (residual - mean of residuals) / standard deviation of residuals

SST = sum of squared differences between observed value and mean value

SSR = sum of squared differences between predicted value and mean value

SSE = sum of squared differences between observed value and predicted value

SST = SSR + SSE

Note: [sum of squared of deviation] varies in 0 to 1

Standard error of estimate is similar to standard deviation but it measures the variations around prediction line (not around mean value) = Square root of (SSE / count)

R2 (Coefficient of Determination) is the proportion of variation in Y explained by variation in
explanatory variable(s) through regression relation.

R2 = SSR / SST

Square-root of R2 gives the correlation between actual and predicted values and termed as Multiple-R

Step 3: Summarize all (all predictor variables) the models in terms of three measures which are Adjusted R square, Durbin Watson, and MAPE. Based on this choose the best model that can predict best results.

Adjusted R2: Modified version of R2 which penalizes a model for including redundant the explanatory variables. It is a proportion of variation in Y explained by variation in X. It is a measure of goodness of model. The value more closer to 1 mean more tighter prediction of the variable by the model. Adjusted R2 is more often used (compared to R2)

Durbin-Watson Statistics

If you plot residuals against predicted values, it should be random. For the time-series based data, Durbin-Watson statistics is used to evaluate the randomness of the residuals. DW is sum of squares of current and previous residuals divided by sum of squares of residuals. DW varies between 0 to 4. The value between 1.5 to 2.5 indicated errors are serially uncorrelated.

Absolute percentage error = (Actual - Predicted) / Actual value of the variable

Mean absolute percentage error (MAPE) is a average of absolute percentage error. It is used to measure a predictive ability of a model.

Additional read:

Dummy variable correlation analysis: http://en.wikipedia.org/wiki/Dummy_variable_(statistics)

Cause and effect: Regression Analysis

Cause and effect

Simple Linear Regression

1 comment

Author Details

Popular Posts

Labels

Counts

Tags

Blog Archive

Thanks For Stopping By

About

Cause and effect: Regression Analysis

Cause and effect

Simple Linear Regression

1 comment

Author Details

Popular Posts

Labels

Counts

Tags

Blog Archive

Thanks For Stopping By

Subscribe To

About