# Tokenizing a corpus with Spacy

Tokenization is a process of diving a corpus into its basic meaningful entities. This is often to words and punctuation but it is not limited to words. In the example below we look at a simple tokenizer with Spacy

import spacy
import en_core_web_sm

# Initialize english tokenizer
[str(token) for token in tokenizer(sample_text)]
['Jenna', 'is', 'an', 'excellent', 'programmer']