Financial Planning & Analytics
Enviado por TUSHAR KANT • 7 de Octubre de 2015 • Síntesis • 4.041 Palabras (17 Páginas) • 138 Visitas
MULTIPLE REGRESSION
INTRODUCTION: The simple linear regression model is used to determine the effect of a single independent variable on the dependent variable.
[pic 1]
However, we are often interested in testing whether a dependent variable (y) is related to more than one independent variable (x2 , x3, x4, …) and there by the emphasis is on the estimation and inferences. The above said process is known as multiple regression and indeed, this is commonly done. However, it is possible that the independent variables could observe each other’s effects. For example, a choice of a restaurant can depend on factors like cost, convenience and ambience. The cost effect might over rid the convenience effect, leading to a regression for cost which would not appear very interesting.
One possible solution is to perform a regression with one independent variable, and then test whether a second independent variable is significant with respect to the residuals from this regression. Then the second variable is included. Continue in a similar manner to include other independent variables.
A multiple regression allows the simultaneous testing and modeling of multiple independent variables.
The model for a multiple regression (called the POPULATION REGRESSION MODEL) takes the form:
[pic 2]
Suppose in the above model, if the population size is ‘m’ (>n) the values of Y1, Y2,…,Ym could be explained as follows. By observing the equations we could see that there are common parameters which are likely to be bound that, these parameters are β1, β2,…,βp. Then, the ‘p’ dimensional representative plane looks like
[pic 3]
For any observation the representative plane is used to deduce the observation.
e.g. For ith observation assume that ith observation = 70 is observed, the representative plane provides value 68, then (70 – 68) provides the error term [pic 4]i = 2, suppose jth observation = 80, the representative plan provides value of 76 then (80 – 76) is the error term [pic 5]j = 4.
Here, there are ‘m’ observations in the population and ‘p’ variables (Y, X2, X3,…,Xp), [pic 6]i is the residue attached to each observation. (Residue is the difference between the actual Y value and predicted y value(y^ ) from the model).
Thus the system of equations are:
[pic 7]
The above set of equations can be reduced to a matrix form as shown below:
[pic 8]
In the X matrix, xji corresponds to jth variable and ith observation.
But in reality we don’t know the population size, then collect the sample observation and estimate the parameters of the population from the sample observation, and then estimated parameters constitutes the sample representative plan.
i.e. Our aim is to estimate the values of β1, β2,…,βp by obtaining data from a sample of the population. Thus, the sample regression equation is as under.
[pic 9]
Above concept is portrayed through a diagram as shown below for one independent variable.[pic 10]
[pic 11]
y^ i is the expected value of yi at xi [E (y/xi)] in the population regression model. Y^ i is the expected value of yi at xi in the sample regression model.
Our entire work is to calculate the values of b1, b2, b3, . . . , bp from the sample drawn out of the population and then estimate the partial regression co-efficients (β1,β2,β3, …,βp) of the population. Here; b1 is the estimator of β1, b2 is the estimator of β2 soon on bp for βp.
If there are 2 variables (1 dependent and 1 independent ) we will be fitting a line, for 3 variables ( 1 dependent and 2 independent) we will be fitting a plane and 4 variables ( 1 dependent and 3 independent) a space.
Multiple regression techniques have a wide application right from prediction of a Tornado to evaluation of a restaurant to have the next meal.
Assumptions: The multiple regression model operates under certain assumptions. These are :-
- The error term is independent of each of the independent variables. i.e. the covariance cov (ε, Xi) = 0.
Since, the regression equation is y = f(x) + ε, and our aim is to
Minimize the error i.e. ε2 = [Y – f(x)]2. This becomes difficult if the
Independent term influences the error term.
- The errors for all possible sets of given values of x2, x3, x4, …, xp are normally distributed.
- The expected values of the errors is zero for all possible sets of given values of x2, x3, x4,…
i.e. E (εi) = 0
- The variance of the errors is finite and is the same for all sets of given values of x2,x3,…i.e. variance (εi) = σ2 is a constant.
- Any two errors are independent i.e. one error is not the cause if another.
- The model should include only those variables which do not have relationships among themselves. i.e., no multi collinearity. If there is a muticollinearity then one independent variable could be estimated by another independent variable as it, like
X2 = K1 + K2X3.
TYPES OF REGRESSION FOR TYPES OF DATA: Different types of regressions exist for different types of data.
Dependent Variable | Independent Variable | Name of the regression |
Metric | Metric | Ordinary regression |
Non-metric | Non metric | Logistic regression |
Metric | (Metric and Non metric)Or non metric | Dummy regression |
Analysis of Multiple Regression Technique: The foremost task is to find the sample representative regression model. The model is constituted by the values of b. The most commonly used method of arriving at the estimators is using the method of ordinary least squares. The method of ordinary least square is explained as follows:
...