Beta Distribution

The beta distribution is one of the most important distributions in Bayesian statistics particularly because of its usefulness as a prior for random variables whose domain is between $0$ and $1$. Such random variables like the click-through rate or conversion rate are binomially distributed random variables whose proportional outcome is a continuous variable.

The beta distribution takes two parameters $\alpha$ and $\beta$. The two parameters can often represent the success and failure counts which makes the distribution useful as a prior distribution for binomial and Bernoulli experiments in bayesian statistics.

We can mathematically represent the beta distribution as:

$$ X \sim Beta( \alpha, \beta ) $$

where:
$X$: random variable X and $ 0< x < 1 $
$\alpha$: positive real values
$\beta$: positive real values

Beta Random Variable

The beta method on scipy's can be used to create a random variable by specifying the parameters $\alpha$ and $\beta$, and providing the scale of the random variable within the interval $0$ and $1$.

The function arguments are:
$a$ = $\alpha$ parameter
$b$ = $\beta$ parameter
$loc$ = lower bound $x$
$scale$ = upper bound $x$

from scipy.stats import beta
import matplotlib.pyplot as plt

sim_beta = beta.rvs( a=1, b=1, loc=0, scale=1 ,size=30 )
sim_beta
array([0.52349944, 0.51604929, 0.5459929 , 0.717023 , 0.22830017, 0.67463911, 0.09231155, 0.47363061, 0.54841061, 0.76160665, 0.87277837, 0.40836231, 0.9854341 , 0.82074996, 0.97782493, 0.96712907, 0.51199631, 0.14425613, 0.2182889 , 0.70578307, 0.26981778, 0.486079 , 0.75093507, 0.8162237 , 0.50725202, 0.705163 , 0.71232759, 0.58926251, 0.89437031, 0.42477053])

Notice that the random observations in the beta distribution are all contained in the domain specified between the $loc$ and $scale$ values.

Visualization Beta Random Variable

Below, I provide a visualization of the bar plots for beta random variables with varying $\alpha$ and $\beta$ parameters. Notice how the shape of the distributions changes with changes in the parameters. Notice that the changes in the distribution shape is influenced by the magnitude of the $\alpha$ and $\beta$ parameters. That is:

  • if $\alpha$ > $\beta$ and $\alpha > 0$, $\beta > 0$, the distribution is dense to the left
  • if $\beta$ > $\alpha$ and $\alpha > 0$, $\beta > 0$, the distribution is dense to the right
  • import seaborn as sns
    import matplotlib.pyplot as plt
    
    %matplotlib inline
    %config InlineBackend.figure_format = 'retina'
    
    fig = plt.figure(figsize=(15,10))
    
    fig.add_subplot(221)
    sim_beta = beta.rvs( a=10, b=10, loc=0, scale=1 ,size=10000 )
    sns.distplot( sim_beta, kde= False, bins=20, hist_kws=dict(edgecolor="k", linewidth=2) )
    plt.title('Beta Distribution: α = 10, β=10')
    
    fig.add_subplot(222)
    sim_beta = beta.rvs( a=100, b=8, loc=0, scale=1 ,size=10000 )
    sns.distplot( sim_beta, kde= False, bins=20, hist_kws=dict(edgecolor="k", linewidth=2) )
    plt.title('Beta Distribution: α = 100, β=8 ')
    
    fig.add_subplot(223)
    sim_beta = beta.rvs( a=5, b=500, loc=0, scale=1 ,size=10000 )
    sns.distplot( sim_beta, kde= False, bins=20, hist_kws=dict(edgecolor="k", linewidth=2) )
    plt.title('Beta Distribution: α = 0, β= 500 ')
    
    fig.add_subplot(224)
    sim_beta = beta.rvs( a=.5, b=.9, loc=0, scale=1 ,size=10000 )
    sns.distplot( sim_beta, kde= False, bins=20, hist_kws=dict(edgecolor="k", linewidth=2) )
    plt.title('Beta Distribution: α = .5, β=.9')
    
    Beta Distribution

    Probability Density Function

    The probability density function of the beta distribution is defines as follows:

    $$ p(x) = \begin{cases} \frac { x^{\alpha - 1}(1 - x)^{\beta - 1}} {B(\alpha, \beta)} & \text{ for } 0< x < 1 \\ 0 & \text{ for } x < 0\ or\ x\ > 1 \end{cases} $$

    $$ B( \alpha, \beta ) = \frac { \Gamma (\alpha) \Gamma (\beta)} {\Gamma (\alpha + \beta) } = \frac { (\alpha -1 )! (\beta - 1)!} { (\alpha + \beta -1)!} $$

    where:
    $\alpha:$ positive real number parameter
    $\beta:$ positive real number parameter
    $B(\alpha, \beta):$ is the beta function with parameters $\alpha$ and $\beta$


    Example:
    Suppose that a landing page generally follows a beta distribution with $\alpha = 46$ and $\beta = 21$. What is the probability that the conversion rate over a random sample is at least 20% points higher than the mean?

    For this question, we first need to compute the mean of the distribution and determine the exact value that is 20% higher of the mean.

    mean = 46/(46 + 21)
    1 - beta.cdf(x= mean+.2, a=46, b=21, loc=0, scale=1 )
    7.495178271255121e-06

    Expected Value

    The expected value of the beta distribution is approximated by:

    $$ E(X) = \frac {\alpha}{\alpha + \beta} $$

    Example:
    Compute the mean of a beta distribution with $\alpha = 20$ and $\beta = 21$. Notice that the values converge with a higher sampling size.

    random_v = beta.rvs( a=20, b=21, loc=0, scale=1, size=100000 )
        random_v.mean(), 20/(20 + 21)
    (0.4880173195320306, 0.4878048780487805)

    Variance

    The variance of the beta distribution is approximated by:

    $$ Var(X) = \frac {\alpha \beta}{(\alpha + \beta + 1)( \alpha + \beta )^2 } $$

    Example:
    Compute the variance of a beta distribution with $\alpha = 20$ and $\beta = 21$. Notice that the values converge with a higher sampling size.

    random_v = beta.rvs( a=20, b=21, loc=0, scale=1, size=100000 )
        random_v.var(), (20 * 21 )/((20 + 21 + 1)*( 20 + 21)**2)
    (0.005961096063475813, 0.00594883997620464)

    Standard Deviation

    The standard deviation is approximated by the formular:

    $$ std(x) = \sqrt{\frac {\alpha \beta}{(\alpha + \beta + 1)( \alpha + \beta )^2 } } $$

    Example:
    Compute the standard deviation of a beta distribution with $\alpha = 20$ and $\beta = 21$. Notice that the values converge with a higher sampling size.

    import numpy as np
    random_v = beta.rvs( a=20, b=21, loc=0, scale=1, size=100000 )
    random_v.std(), np.sqrt( (20 * 21 )/((20 + 21 + 1)*( 20 + 21)**2) )
    (0.07690922357155076, 0.07712872341874095)