Pandas Groupby with Different Aggregation Functions
Like SQL, pandas provides a useful aggregation method in the form of GROUP BY. In this example, I demonstrate the use of pandas groupby with multiple aggregation functions.
import pandas as pd
from sklearn import datasets
Loading the iris dataset from the sklearn datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
decode_species = {0: 'Setosa', 1:'Versicolor', 2:'Virginica' }
iris_df['Species'] = [decode_species.get(specie) for specie in iris.target]
iris_df.head()
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | Species | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | Setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | Setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | Setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | Setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | Setosa |
To perform multiple aggregations, we pass a dictionary of the column we wish to aggregate and a list of aggregation types, as many as needed - both standard functions like below and custom functions.
iris_df.groupby(['Species'], as_index=False).agg( { 'sepal length (cm)': ['count','sum'],
'sepal width (cm)': 'max',
'petal length (cm)': ['min', 'max'] })
| Species | sepal length (cm) | sepal width (cm) | petal length (cm) | |||
|---|---|---|---|---|---|---|
| count | sum | max | min | max | ||
| 0 | Setosa | 50 | 250.3 | 4.4 | 1.0 | 1.9 |
| 1 | Versicolor | 50 | 296.8 | 3.4 | 3.0 | 5.1 |
| 2 | Virginica | 50 | 329.4 | 3.8 | 4.5 | 6.9 |