Regression is a methodology that uses observations, computing the relationship between a dependent variable and a set of independent variables. The significance of quantifying this relationship is that the impact of every independent variable to that of the dependent variable is known. With this knowledge, one only needs to know the values of independent variables. To predict the dependent variable costs. Regression generates the line of best fits for recorded data. Usually, the observation varies; hence the line cannot fit precisely on the line. The best-fitted line for the data, however, leaves the least amount of unexplained variation surrounding it.
Model justification and expectations
The relationship between continuous dependent variable Y and one or more independent variables X is modelled by regression. The primary purposes for it are the identification of variables in association with Y, prediction of future observations of Y, and postulate what causes Y. For regression to be valid, theoretical assumptions must be met (Susanti and Pratiwi, 2014). It is assumed that the dependent variable Y has a linear variation with explanatory variables X’s. Y has an association with X’s, where X’s are determined by external variables from the model and are measured accurately. It is assumed that Y observations are randomly selected from the sampled population. There is however uncertainty in the linearity between Y and X’s, as reflected by scattering the residuals.
Obtaining the values of the actual beta coefficients for the regression model by the entire population one is interested in is unknown (Susanti and Pratiwi, 2014). It, therefore, requires the estimation of data from random samples. The estimated betas vary and are random variables since they estimated from varying samples. The ordinary least squares (OLS) is the standard method for calculating the betas (parameters) in linear regression models. OLS minimizes the difference between the observed Ys and the estimated Ys predicted by the regression model. Therefore, the OLS beta estimates are unbiased and have minimum variance. It is in all contending unbiased estimators for a regression model. After the theoretical model estimation of a random sample from a population, the fitted regression model is obtained.
The results obtained are crucial in certifying the data collected (Vuko and Čular, 2014). Therefore, a suitable regression model should show the predicted outcome from the observed in a close manner. The fitted model should be better than the theoretical model. Some statistics are used in OLS regression to evaluate the fitness of the model; R-squared, F-test and Root Mean Square Error. These have a basis on the sum of squares totals and amount of square errors. The combinations of these values offer data on in what way the regression model associates to the mean model.
Statistics and compliance with OLS assumption
OLS is a methodology most commonly used to get estimates. By its use, calculations of the beta hats are done by minimization of the sum of the squared residuals. Therefore, a regression analysis uses OLS and statistics to obtain estimated equations. For the regression model to be more accurate certain OLS assumptions are considered. The regression model is assumed to be linear, correctly specified, and should have an additive error term. All the coefficients must enter the regression model linearly hence linearity assumption (Vuko and Čular, 2014). Variables must not be omitted for the functional form to be correct. It is the assumption of the exact specification of the model. The additive error term has an implication that the error term cannot statistically be multiplied or divided by any other variable. It is because the error term is uncorrelated with other variables.
Regression modelling uses observation to compute the relationship between dependent and explanatory variables. To formulate its data collected from random samples is used to estimate corresponding variables. It is mainly used to identify variables associated with Y among others. However, for the model to be useful, some assumptions are considered and incorporated in the regression model. Since obtaining the betas in difficult, estimation is done mainly by OLS. Statistical and OLS are used by regression with certain assumption so that accurate result can be achieved from the estimated betas.
Verburg, I. W., de Keizer, N. F., de Jonge, E., & Peek, N. (2014). Comparison of regression methods for modelling intensive care length of stay. PloS one, 9(10), e109684.
Vuko, T., & Čular, M. (2014). Finding determinants of audit delay by pooled OLS regression analysis. Croatian Operational Research Review, 5(1), 81-91.
Susanti, Y., & Pratiwi, H. (2014). M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3), 349-360.