What is Covariance?

This post assumes you already know what variance is but if you are unfamiliar, check Variance out first.

Theoretically speaking, for the population, the covariance is defined as . Oftentimes, we don’t know the full probability distribution of the random variables X and Y so we use the sample covariance to approximate the population covariance.

Then, by defition, sample covariance between two random variables X and Y, for sample size is defined as where and are the sample means of X and Y, respectively. Note that here we use the factor in calculating the sample covariance to account for the fact that the sample means are just estimates not the true means.

So we have the definition of the covariance but what does it actually represent? In a nutshell, covariance tells us how the two variables change together on average. If X and Y are linearly related, their covariance tells us the strength and direction of the linear relationship. Positive covariance means X and Y tend to increase or decrease together whereas negative covariance indicates that X and Y tend to move in opposite directions. If the covariance is , that means there is no relationship between the two variables.

Warning

This is not a completed post