Box Plot Distributions

Box and Whiskers plots are generally used to understand the distribution of the data given some categorical dimension. Pygal offers box and whiskers plots with alternative modes to visualize distributions of datasets. In this example, I use the iris dataset to demonstrate the implementation of box and whiskers in Pygal

import pygal 
import pandas as pd
import numpy as np
from sklearn import datasets 


# loading the iris dataset
iris = datasets.load_iris()
iris_data = pd.DataFrame( iris.data, columns=iris.feature_names )

decode_species = { 0: 'Setosa', 1:'Versicolor', 2:'Virginica' }
iris_data['species'] = [ decode_species.get(specie) for specie in iris.target ]
iris_data.head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

The implementation below sets a Box and Whiskers plot to use the Tukey method and removes grid lines from the plot. For this example, we use the column petal length.

metric = 'petal length (cm)'

box_plot = pygal.Box(box_mode="tukey", show_y_guides=False)
box_plot.title = 'Box Plot Distribution for Petal Length (cm)'
box_plot.y_title = metric

for species in decode_species.values():
    box_plot.add( species, iris_data[iris_data['species'] == species][metric]) 

box_plot.render_in_browser()
Box Plot Distribution for Petal Length (cm)00112233445566Min: 1 Lower Whisker: 1.1 Q1: 1.4 Q2: 1.5 Q3: 1.6 Upper Whisker: 1.9 Max: 1.9109.12820512820511412.10869565217394Min: 3 Lower Whisker: 3.3 Q1: 4 Q2: 4.35 Q3: 4.6 Upper Whisker: 5.1 Max: 5.1303.9999999999999205.97993311036794Min: 4.5 Lower Whisker: 4.5 Q1: 5.1 Q2: 5.55 Q3: 5.9 Upper Whisker: 6.9 Max: 6.9498.87179487179475107.75250836120404Box Plot Distribution for Petal Length (cm)petal length (cm)SetosaVersicolorVirginica