Filtering Data
This section will look at filtering data. Filtering data involves row operations, comparing each value of the row
for a given column against a specific value.
import numpy as np
import polars as pl
data = pl.read_csv('data/employees.csv')
print(data.head())
Single Column Filter
In the following example, we filter for the Engineering department. We use pl.col expression with base
comparision to a value as demonstrated below.
engineers = data.filter( pl.col("department") == "Engineering" )
print(engineers)
Multiple Column Filter
We can also filter on more than one column. We simply combine the more than one expression into the filter
function as demonstrated below.
senior_engineers = data.filter( (pl.col("department") == "Engineering") &
(pl.col("age") > 30) &
(pl.col("salary") > 100000 ) )
print(senior_engineers)
Negative Selection
Another way to implement selection is a negative selection. That is select every item that is not this value.
Below is an example of achieving this.
# NOT condition
not_hr = data.filter(pl.col("department") != "HR")
print(not_hr['department'].unique())
is_in() filtering
Another available method for filtering is the is_in(). This allows for filtering and selection based on a defined list of items.
tech_depts = data.filter(pl.col("department").is_in(["IT", "Engineering"]))
print(tech_depts.head())