Descriptive Statistics
This notebook covers the implementation of descriptive statistics and information methods that are available with
pandas. This section will use the titanic dataset for demonstration.
import pandas as pd
titanic = pd.read_csv('./titanic.csv')
titanic.head()
1. describe
The describe method provides a 5 number summary for all of the numerical columns in the dataframe object. There are
options to include non-numeric columns but the output is not meaningful for many applications. To include
non-numeric columns, use the keyword argument _include="all"_ for non-numeric inclusion.
titanic.describe(include='all')
describe - include objects
To describe non-numeric data, we can specify that the describe function only include objects. This will compute features like frequency, count and uniques for all non-numeric columns.
titanic.describe(include='object')
General Statistical Methods
Outside the describe method, there exists multiple statistical functions that can be leverage easily upon a pandas
object. Below are a few examples.
var() - variance
The variance method will return the variance $(y - \bar {y})^2 $ for all the numeric columns in the dataframe
titanic.var()
std() - standard deviation
Similar to the variance method, the std() method returns the standard deviation of an array. Mathematically, $ std = \sqrt { var() } $
titanic.std()
median() - median
The median method returns the middle observation for all of the numeric columns.
titanic.median()
sum() - sum
The sum method returns the sum of all the numeric columns in the dataframe.
titanic.sum()
mean()
Similar to the sum function, the mean returns the average for all of the numerical columns in the dataframe.
titanic.mean()
cumsum()
The cumsum methods computes the cumulative sum of specified column. Notice that it returns an array with the cumulative sum at each observation. Below, I print out the top 5 cumulative sums
titanic[['Fare']].cumsum(skipna=True, axis=0).head()
Other Statistical/Mathematical Functions
There are a few more statistical functions that we have not looked at but function is much the same way as what we have seen above. Overall, all the available functions are: