Programming Notes | Machine Learning | Probability and Statistics

Expectation Theory

In this section, I introduce expectation theory as it is used in probability to provide some foundation for the future sections with advanced probability topics

Expected Value

The expected value of a random variable is mathematically equivalent to the mean of the random variable. Mathematically, expected value a random variable $X$ is:

$$\mathbf{E}(X) = \int_{-\infty}^{\infty} x\ f_X(x)\ dx = \sum x\ p(x) $$

where:

$f_X(x)$: is the Probability Mass/Density function of the random variable.

$p(x)$: is the probability of each observation in the random variable.

Even for those of us that understand calculus, the summation is often the most common representation of expectation.

For example, let compute the expected value of some simulated data.

x = [ 2, 3, 5, 2, 3, 1, 1, 4, 2, 4 ]
sample_space = { i:x.count(i)/len(x) for i in list(set(x)) }

expected_value = sum([ k*v  for k,v in sample_space.items() ])
expected_value

2.7

In the example above, we computed the expected value by summing the observations weighted by their probability. However, because we know the mean and expected value are the same in value, we can calculate expected value with mean.

import numpy as np
np.mean(x)

2.7

Mean vs Expected Value

The $mean$ and $expected value$ are the same in magnitude. The difference is often in usage of the terms. Expected value is widely used when referring to a random variable while the mean is a parameter of a sample of the population.

Variance

Variance measures the dispersion of the data from the mean of the observations. Mathematically, the variance of a random variable x is:

$$Var(X) = (X - \mu_ {X})^2$$.

Below, is the full derivation of variance using expectation.

$$ Var(X) = \mathbf{E}( (X - \mu_ {X} )^2) $$ $$ Var(X) = \mathbf{E}( X^2 - 2 \mu_ {X}X + \mu_ {X}^2 ) $$ $$ Var(X) = \mathbf{E}(X^2) - 2 \mathbf{E}(\mu_ {X}X) + \mathbf{E}(\mu_ {X}^2) $$ $$ Var(X) = \mathbf{E}(X^2) - 2 \mathbf{E}(X)\mu_ {X} + \mathbf{E}(\mu_ {X}^2) $$

However, $\mathbf{E}(\mu_ {X}^2) = \mu_ {X}^2$ because $\mu_ {X}^2$ is a constant.

$$ Var(X) = \mathbf{E}(X^2) - 2 \mu_ {X}^2 + \mu_ {X}^2 $$ $$ Var(X) = \mathbf{E}(X^2) - \mu_ {X}^2 $$ $$ Var(X) = \mathbf{E}(X^2) - (\mathbf{E}(X))^2 $$

Now, let's compute the variance using the numpy library.

x = [ 2, 3, 5, 2, 3, 1, 1, 4, 2, 4 ]
np.var(x)

1.61

Variance Linear Transformation

Now that we have the formal derivation of the variance, we can look at some simple transformation rules.

Linear transformation with constants $$ Var(aX + b) = a^2Var(X) $$
Independent Random Variables $A$ and $B$ $$ Var(A + B) = Var(A) + Var(B) $$
Non-independent Random Variables $A$ and $B$ $$ Var(A + B) = Var(A) + Var(B) + 2cov(A,B) $$

Covariance

Covariance is a measure of the measure of association or dependendency between two random variables. Mathematically, the covariance of two variables is:

$$cov(A, B) = \mathbf{E}(XY) + \mathbf{E}(X)\mathbf{E}(Y)$$

A few things to note here:

cov(A, B ) will be positive if large values of A tend to occur with large values of B , and small values of A tend to occur with small values of B
cov(A, B ) will be negative if large values of A tend to occur with small values of B , and small values of A tend to occur with large values of B.
If A and B are independent, then there is no pattern between large values of A and large values of B, so cov(A,B) = 0. However, cov(X,Y) = 0 does NOT imply that A and B are independent, unless A and B are Normally distributed.

x = [ 2, 3, 5, 2, 3, 1, 1, 4, 2, 4 ]
y = [ 3, 6, 3, 7, 8, 3, 4, 6, 5, 8 ]
np.cov(x, y)[0][1]

0.7666666666666667

Standard Deviation

The standard deviation is the normalized dispersion of observations from the mean of the random variable.

Mathematically:

$$ s = \sqrt \frac {\sum (x - \mu_ {x})}{N - 1} $$

To calculate the standard deviation we often simply take the square root of the variance. With numpy it is simply a method with numpy.

np.std(x), np.std(x)

(1.2688577540449522, 1.2688577540449522)