Box Plot Distributions
Box and Whiskers plots are generally used to understand the distribution of the data given some categorical dimension. Pygal offers box and whiskers plots with alternative modes to visualize distributions of datasets. In this example, I use the iris dataset to demonstrate the implementation of box and whiskers in Pygal
import pygal
import pandas as pd
import numpy as np
from sklearn import datasets
# loading the iris dataset
iris = datasets.load_iris()
iris_data = pd.DataFrame( iris.data, columns=iris.feature_names )
decode_species = { 0: 'Setosa', 1:'Versicolor', 2:'Virginica' }
iris_data['species'] = [ decode_species.get(specie) for specie in iris.target ]
iris_data.head()
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
The implementation below sets a Box and Whiskers plot to use the Tukey method and removes grid lines from the plot. For this example, we use the column petal length.
metric = 'petal length (cm)'
box_plot = pygal.Box(box_mode="tukey", show_y_guides=False)
box_plot.title = 'Box Plot Distribution for Petal Length (cm)'
box_plot.y_title = metric
for species in decode_species.values():
box_plot.add( species, iris_data[iris_data['species'] == species][metric])
box_plot.render_in_browser()