Programming Notes | Machine Learning | Probability and Statistics

Discrete Distributions

In this notebook, I provide a summary of the most commonly used distributions in data science and applied statistics. The objective is to provide a quick guide on discrete distributions and the problems that they solve.

To begin, let's define what discrete random variables are.

What is a Discrete Random Variable?

Discrete random variables is a variable that has a finite number of values. For example, the number of children can only be expressed as discrete countable values like $0, 1, 2, 3, ...$. Many measurements that are observed are often discrete and should be modeled with an appropriate discrete distribution. For example:

Number of clicks of an advertisement
Number of Heads in a $n$ number of dice rolls
Number of trials before a success event

All of the above measurements are discrete and we will show that there are specific distributions that model these random variables. Below, we discuss these distributions and we will go in-depth for all of them in their individual sections

1. Bernoulli Distribution

The bernoulli distribution models the probability of success on a single trial when there are only two possible events. For example, the roll of a fair coin has only heads and tails as the outcome therefore, the bernoulli distribution models. Mathematically, we define the bernoulli distributions as:

$$ X \sim Bernoulli_p(p) $$

where:
$X$ is a random variable
$\beta_p$: bernoulli distributed with probability of success $p$

Probability Mass Function

Another important concept with distributions is the probability mass function which returns the probability of all events in a given distribution. The bernoulli distribution has a probability mass function of the form:

$$pmf = p^x(1-p)^{(1-x)}$$

where:
$x: 0, 1$
$p:$ probability of success.

Example:
What is the probability of rolling a 6 in a single dice roll?

Note: we know that the probability of success is 1/6. We assign rolling a six as $x=1$ and everything else as $x=0$. Therefore the probability is:

((1/6)**1)*(1 - (1/6))**(1-1)

0.16666666666666666

More on the bernoulli distribution in the bernoulli section below.

2. Binomial Distribution

Many regard the bernoulli distribution as a special case of the binomial but some argue that the binomial distribution is an extension of the bernoulli distribution. In essence, the binomial distribution is bernoulli distribution when the trials are n times. That is, it models the probability of success out a binary outcome over $n$ trials.

Mathematically, it is represented as:

$$ X \sim Binomial_{p, n} = \beta_{p, n} $$

where:
$p$: probability of success
$n$: number of trials

An example of a problem modeled by this distribution is: what is the probability of getting 4 out of 6 questions right is each question has 4 multiple choices with one correct answer?

Before we answer these questions, let's define the probability mass function which will solve this question.

Probability Mass Function

The probability mass function of the binomial distributions is:

$$ P(x) = \binom{n}{k}(p^k)(1-p)^{n-k} $$

where:
$n$: is the number of trails
$k$: is the number of successes
$p$: is the probability of success

Now back to our example:
What is the probability of getting 4 out of 6 questions correct with each question having 4 multiple choices.

From this example, we have the following information:
$n$ = 6
$k$ = 4
$p$ = .25

We can calculate this is in python pretty easily using the pmf function

from scipy.special import comb
comb(6, 4)*(.25**4)*(.75**(6-4))

0.032958984375

3. Geometric Distribution

The geometric distribution models the number of trails until the first succeess is observed. For example, how many coins flips must happen before a Head is observed? Another more interesting example is, suppose after a night of partying, a young college student is trying all of his 10 keys to see which one is the key for his apartment.

The geometric distribution is considered memoryless in that, every trail is independent of the next. Thus the first mistake doesn't guarantee the same choice won't be made again.

Mathematically, the geometric distribution is expressed as:

$$ X \sim {Geometric}(p)$$

where:
$p$ is the probability of success.

Probability Mass Function

The probability mass function of the geometrics distributions is a slight modification to the binomial distribution.

$$f_X(x) = p(1-p)^{x-1}$$

where:
$x$: number of trials until the first success

Example:
Suppose the probability of getting a correct answer from a set of random questions is .3. What is the probability of getting the correct answers in the first 5 questions?

.3*(1-.3)**(5-1)

0.07202999999999998

4. Poisson Distribution

The poisson distribution is used to model the rate of events of a specific time horizon or interval. An example of such observation is the number of goals scored in a regular premier league game.

Mathematically, the poisson distribution:

$$ X \sim Poisson_{\lambda}$$

where:
$\lambda$: is the expected value of the random variable.

Probability Mass Function

We can compute the probability of a poisson distributed random variable with the formula below:

$$ P(X = x) = \frac {\lambda ^ k e^{-\lambda} } {k!} $$

where:
$\lambda$: is the rate of events
$k$: 0,1,2,3,4,5...

Example:
Suppose that 15 points are earned each NFL game, what is the probability that 20 points will be earned in tomorrow's game?

import numpy as np
from scipy.special import factorial
    
(15**20)*(np.exp(-15))/factorial(20, exact=True)

0.041810305001064585

As we should expect the probability is much lower given our lambda value at $\lambda = 15$

5. Negative Binomial Distribution

The negative binomial distribution is an extension of the geometric distribution in that it models of the probability of r success in $n$ trials.

We represent the negative binomial distribution as:

$$X \sim NegBin(r, p)$$

where:
$r$: number of success; $x$ = {$0,1,2,3,...$}
$p$: probability of success

Probability Mass Function

The probability mass function of the negative binomial distribution is computed by the formula:

$$ P(X=x|\ r,\ p) = \binom{x - 1}{r - 1}(p)^r(1-p)^{x-r} $$

The negative binomial distribution assumes the following:

Binary outcome for every trial
All trials are independent
The probability of success $p$ is constant across trials

Example:
The probability of scoring a goal on a premier league match is .25 on the first try. What is the probability of scoring 2 goals in the first 5 tries?

$$ P(X=2) = \binom{5 - 1}{2 - 1}(.25)^2(1-.25)^{5-2} $$

from scipy.special import comb
comb(4, 1)*(.25**2)*(.75**(5-2))

0.10546875

About at 10% probability. That sounds about right for $2$ goals over $5$ tries.