Geometric Distribution
The geometric distribution is used to model the number of trials/experiments it takes for a specific event to occur - often referred to as a success. We can think about how dice rolls can we roll until number 6 is rolled. This distribution takes one parameter and models the trials needed to obtain first success.
Mathematical representation:
$$ X \sim {Geometric}(p)$$
where:
$p$: is the probability of success.
A few conditions must be true for the geometric distribution to hold:
- Each trial has binary outcomes
- All trials have the same probability of success
- Each trial is independent of the previous trial
Geometric Random Variable
To generate a random variable that is geometrically distributed, we call the geom method and initialize the probability value. Notice that the output is a set of integers that are the number of trails needed for yielding a success event given the probability.
from scipy.stats import geom
geom_dist = geom.rvs(p=.3, size=100)
geom_dist
Visualizing the Distribution
We can visualize the random variable we generated above. We notice that the probability of success decreases as the number of trials increases.
geom_dist = geom.rvs(p=.3, size=1000)
_ = plt.hist(geom_dist, bins=20, ec='black')
plt.xlabel('Geometric Trials')
plt.ylabel('Frequency')
plt.title('Geometric Distribution')

Probability Mass Function
The probability mass function of the geometrics distributions is a slight modification to the binomial distribution.
$$f_X(x) = p(1-p)^{x-1}$$
where:
$x$: number of trials until the first success
Example:
Suppose the probability of getting a correct answer from a set of random questions is .3.
What is the probability of getting the correct answers in the first 5 questions?
geom_dist = geom(p=.3)
geom_dist.pmf(5), .3*(1-.3)**(5-1)
Probability Density Function
The probability density function returns the cumulative probability of success. For example, to compute the probability of success after the first 5 trials, we can run the following method.
geom_dist.cdf(5)
Expected Value
The expected value of the geometric distribution is given by:
$$ E(X) = \frac {1}{p} $$
where:
$p$: is the probability of success
In python, to compute the expected value from the geometric distribution, we can use the formula or the mean method
geom_dist = geom( p=.3 )
geom_dist.mean(), 1/.3
Variance
The variance of the geometric distribution can be calculated with the following formular:
$$ Var(X) = \frac {1-p}{p^2}$$
The distribution object has the variance method but we can also compute the variance with the formula.
geom_dist.var(), (1-.3)/(.3**2)
Standard Deviation
The standard deviation is simply the square root of variance. For the geometric distribution, the standard deviation is computed as:
$$ sigma = \frac { \sqrt {1 - p} } {p} $$
Computing this in python assuming that $p$ = .3:
geom_dist.std(), np.sqrt(1-.3)/.3