Pandas Group BY with Multiple Aggregation Functions

Like SQL, pandas provides a useful aggregation method in the form of GROUP BY. In this example, I demonstrate the use of pandas groupby with multiple aggregation functions.

import pandas as pd
from sklearn import datasets
            

Loading the iris dataset from the sklearn datasets

iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)

decode_species = {0: 'Setosa', 1:'Versicolor', 2:'Virginica' }
iris_df['Species'] = [decode_species.get(specie) for specie in iris.target]
iris_df.head()

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) Species
0 5.1 3.5 1.4 0.2 Setosa
1 4.9 3.0 1.4 0.2 Setosa
2 4.7 3.2 1.3 0.2 Setosa
3 4.6 3.1 1.5 0.2 Setosa
4 5.0 3.6 1.4 0.2 Setosa

To perform multiple aggregations, we pass a dictionary of the column we wish to aggregate and a list of aggregation types, as many as needed - both standard functions like below and custom functions.

iris_df.groupby(['Species'], as_index=False).agg( { 'sepal length (cm)': ['count','sum'],
        'sepal width (cm)': 'max',
        'petal length (cm)': ['min', 'max'] })

Species sepal length (cm) sepal width (cm) petal length (cm)
count sum max min max
0 Setosa 50 250.3 4.4 1.0 1.9
1 Versicolor 50 296.8 3.4 3.0 5.1
2 Virginica 50 329.4 3.8 4.5 6.9