Filter Dataframe by Column Frequency with value counts
Sometimes, I need to filter a dataset based on the specific frequency threshold from one column. For example, how many users visited a website more than 5 times a day?
import pandas as pd
# reading sample dataframe
df = pd.DataFrame( {'letters': ['E', 'E', 'B', 'C', 'B', 'D', 'E', 'B', 'E', 'A', 'E', 'B', 'C',
'D', 'E', 'E', 'A', 'E', 'B', 'E', 'B']} )
df.value_counts()
Filtering for letters with more than 5 appearances. It works by retrieving the index from the series object returned by value_counts().
df[ df.letters.isin( df.letters.value_counts()[df.letters.value_counts() > 5 ].index )]
letters | |
---|---|
0 | E |
1 | E |
2 | B |
4 | B |
6 | E |
7 | B |
8 | E |
10 | E |
11 | B |
14 | E |
15 | E |
17 | E |
18 | B |
19 | E |
20 | B |