Filter Dataframe by Column Frequency with value counts
Sometimes, I need to filter a dataset based on the specific frequency threshold from one column. For example, how many users visited a website more than 5 times a day?
import pandas as pd
# reading sample dataframe
df = pd.DataFrame( {'letters': ['E', 'E', 'B', 'C', 'B', 'D', 'E', 'B', 'E', 'A', 'E', 'B', 'C',
'D', 'E', 'E', 'A', 'E', 'B', 'E', 'B']} )
df.value_counts()
Filtering for letters with more than 5 appearances. It works by retrieving the index from the series object returned by value_counts().
df[ df.letters.isin( df.letters.value_counts()[df.letters.value_counts() > 5 ].index )]
| letters | |
|---|---|
| 0 | E |
| 1 | E |
| 2 | B |
| 4 | B |
| 6 | E |
| 7 | B |
| 8 | E |
| 10 | E |
| 11 | B |
| 14 | E |
| 15 | E |
| 17 | E |
| 18 | B |
| 19 | E |
| 20 | B |