Basic Expressions
This notebook will look at basic expression of Polars to data exploration and basic analysis. A number of these
functions exist with other tools like pandas and should be straight forward. This short note will focus on the
implementation of the ideas.
Reading a CSV File
Let's begin by reading a CSV file
import numpy as np
import polars as pl
data = pl.read_csv('data/employees.csv')
head()
The head function returns the top values of the variable that contains the dataframe. You can pass in an n
argument to specificy how many rows of data to return. In the example below, I set $n=10$.
data.head(n = 10)
tail()
Much like the head() method, the tail() method returns the last $n$ values in a dataframe.
data.tail(n = 10)
shape()
The shape method returns a tuple of values containing the number of rows and columns in the dataset. This is also
printed along with head() and tail() methods. These can also
be achieved through attributes height and width
data.shape
data.height, data.width
schema
schema is an attribute that can be both set and retrieved when working with a dataset.
It returns the datatypes assigned to all of the columns. Schema can be set for type safety or infered from the
dataset itself. Infact, polars gives us options to determine the length of the scan of data to determine these
types.
data.schema
glimpse()
If you have use R and it's suite of tidyverse tools, you may know about the glimpse() method, which literally returns a sample of each column together with the column name, datatype and a few observations.
data.glimpse()
null_count()
To get a sense of the number of null values in the dataset, the method null_count() is provided that returns the number of null values across all columns.
data.glimpse()
n_unique()
Similarly, we can get the number of unique counts using the n_unique() method.
data.n_unique()
columns
The columns returns the names of the columns of the data.
data.columns