Column Selection
The ability to single out a column or a set of column from a table is a very useful way to explore and treat a
specific part of the dataset independent of the rest. This can be very useful in analysis and visualization as
well. Let's begin by reading our original dataset with employees.
import polars as pl
data = pl.read_csv('data/employees.csv')
data.head()
select()
The select method is the interface for working with specific columns. In the example below, I select the
age column into its own independent variable.
age = data.select("age")
print(age.head())
selecting multiple columns
The select() method can also be used to select more that one column. Simply pass the variable of interest
directly to the select function.
age_salary = data.select("age", "salary")
print(age_salary.head())
pl.col()
Within the select() method, we can use the pl.col interface which allows use to effectively select a column. This
is within the broader concepts of Expressions. For example, specifying pl.col("age")
sets an lazy expression that can be executed when needed.
type(pl.col("age"))
age_salary = data.select(pl.col("age"), pl.col("salary"))
print(age_salary.head())
pl.col()
Within the select() method, we can use the pl.col interface which allows use to effectively select a column. This
is within the broader concepts of Expressions. For example, specifying pl.col("age")
sets an lazy expression that can be executed when needed.
type(pl.col("age"))
age_salary = data.select(pl.col("age"), pl.col("salary"))
print(age_salary.head())