 # Correlation and Regression

## Define Correlation. Describe the use of a scatter diagram for ascertaining the correlation between two variables.

1. Correlation between two variables is the extent of the relationship between the two variables. There are various methods of ascertaining this correlation like calculating the correlation coefficient, scatter diagrams etc.
2. In a scatter diagram, the data points are plotted and a line of best fit is drawn to understand the direction and extent of linear correlation.
3. If when one of the variable increases the other also increases or one decreases the other also decreases, we say they are moving in the same direction and hence the correlation is positive.
4. If when one variable increases, the other decreases or viceversa, we say they are moving in opposite directions and hence the correlation is negative.
5. If the data points are close to the best fit line, we say there is a strong correlation and if they are away and spread apart, we say the correlation is weak.
6. Using the above pointers, we can comment on the correlation between two variables using a scatter plot as suggested in the following diagrams: ## Define regression. Why are there two regression lines? Under what conditions will there be only one regression line?

1. Regression analysis (Bivariate linear) aims at finding the best fit line for a given set of data for two variables say x and y. It describes the relationship between the two variables mathematically.
2. To fit the best line representative of the given data points, we aim at finding a line which is as near as possible to all data points.
3. While we try to minimize the total distance of the data points from the line, we realize that we can measure distance of a point from the line in atleast three ways viz. Parallel to the x-axis, parallel to the y-axis and perpendicular to the line (shown below) 4. While estimating y (dependent variable), we use the distance parallel to the y-axis and get the regression line y on x.
5. While estimating x (dependent variable), we use the distance parallel to the x-axis and get the regression line x on y.
6. Both the regression lines will be the same, if there is a perfect correlation between the the two variables. I.e. correlation coefficient is either 1 or -1.