POS Tagging

POS Tagging or parts-of-speech tagging is a useful way of labelling and categorizing tokens in a corpus. Below we use Space to demonstrate POS tagging

import spacy
import en_core_web_sm

nlp = en_core_web_sm.load()

# sample corpus        
sample_text = "sofia is an amazing data scientist"

[" ==> ".join([str(token), token.pos_ ]) for token in nlp(sample_text)]

The output is:

[ 'sofia ==> NOUN',
 'is ==> VERB',
 'an ==> DET',
 'amazing ==> ADJ',
 'data ==> NOUN',
 'scientist ==> NOUN']