Pandas Groupby Text Group Concat
Groupby functions are often using for mathematical operations. But the can also be used for text aggregation. Below is an example of groupby concatenation for text
import pandas as pd
data = pd.DataFrame( { 'UserID': ['A' , 'A', 'A', 'B', 'B'] ,
'Event': ['Click', 'Click', 'Conversion', 'Click', 'Click'] } )
Visualizing the data
data
UserID | Event | |
---|---|---|
0 | A | Click |
1 | A | Click |
2 | A | Conversion |
3 | B | Click |
4 | B | Click |
Group Concat
We are interested in determining the sequence of events for every user event into one line
paths = data.groupby(['UserID'])['Event'].apply(lambda x: ' -> '.join(x))
path.reset_index()
UserID | Event | |
---|---|---|
0 | A | Click -> Click -> Conversion |
1 | B | Click -> Click |