The tidyverse
The tidyverse package is an opinionated collection of R packages designed for data science and machine learning. It enforces good practices and discourages bad practices by providing a consistent and cohesive set of tools that work well together. The tidyverse simplifies many common data tasks, promoting clear and readable code, and making it easier to transform, visualize, and model data.
Let's begin by installing and loading up a few packages.
# if you don't have it installed
#install.packages(c('tidyverse', 'gt'))
# loading tidyverse
library(tidyverse)
options(tidyverse.quiet = TRUE)
library(gt)
Loading Data
In most cases, the first step to analyzing data will be reading and/or loading datasets. For this example, we
will use the
data(mtcars)
From here on, mtcars can be used as a variable that contains the data
is.data.frame(mtcars)
Dataframe vs. Tibble
There are some important difference to know about
1. Printing: Tibbles have a more user-friendly printing method that shows only the first 10 rows and the columns that fit on the screen, avoiding overwhelming output.
2. Column Types: Tibbles are stricter about column types and do not convert strings to factors by default, unlike data frames.
3. Subsetting: Tibbles do not allow partial matching of column names, reducing potential errors.
4. Performance: Tibbles are generally more modern and optimized for performance with large datasets compared to traditional data frames.
With that in mind, we are going to be using the
mtcars <- as_tibble(mtcars)
# printing tibble object
print(mtcars)
mpg <dbl> |
cyl <dbl> |
disp <dbl> |
hp <dbl> |
drat <dbl> |
wt <dbl> |
qsec <dbl> |
vs <dbl> |
am <dbl> |
gear <dbl> |
carb <dbl> |
---|---|---|---|---|---|---|---|---|---|---|
21 | 6 | 160 | 110 | 3.9 | 2.62 | 16.5 | 0 | 1 | 4 | 4 |
21 | 6 | 160 | 110 | 3.9 | 2.88 | 17.0 | 0 | 1 | 4 | 4 |
22.8 | 4 | 108 | 93 | 3.85 | 2.32 | 18.6 | 1 | 1 | 4 | 1 |
21.4 | 6 | 258 | 110 | 3.08 | 3.22 | 19.4 | 1 | 0 | 3 | 1 |
18.7 | 8 | 360 | 175 | 3.15 | 3.44 | 17.0 | 0 | 0 | 3 | 2 |
18.1 | 6 | 225 | 105 | 2.76 | 3.46 | 20.2 | 1 | 0 | 3 | 1 |
14.3 | 8 | 360 | 245 | 3.21 | 3.57 | 15.8 | 0 | 0 | 3 | 4 |
24.4 | 4 | 147 | 62 | 3.69 | 3.19 | 20 | 1 | 0 | 4 | 2 |
22.8 | 4 | 141 | 95 | 3.92 | 3.15 | 22.9 | 1 | 0 | 4 | 2 |
19.2 | 6 | 168 | 123 | 3.92 | 3.44 | 18.3 | 1 | 0 | 4 | 4 |
The object
1.1. slice_head()
The
slice_head(mtcars, n = 5)
mpg <dbl> |
cyl <dbl> |
disp <dbl> |
hp <dbl> |
drat <dbl> |
wt <dbl> |
qsec <dbl> |
vs <dbl> |
am <dbl> |
gear <dbl> |
carb <dbl> |
---|---|---|---|---|---|---|---|---|---|---|
21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
1.2. slice_tail()
Similarly, the
slice_tail(mtcars, n = 4)
mpg <dbl> |
cyl <dbl> |
disp <dbl> |
hp <dbl> |
drat <dbl> |
wt <dbl> |
qsec <dbl> |
vs <dbl> |
am <dbl> |
gear <dbl> |
carb <dbl> |
---|---|---|---|---|---|---|---|---|---|---|
15.8 | 8 | 351 | 264 | 4.22 | 3.17 | 14.50 | 0 | 1 | 5 | 4 |
19.7 | 6 | 145 | 175 | 3.62 | 2.77 | 15.50 | 0 | 1 | 5 | 6 |
15.0 | 8 | 301 | 335 | 3.54 | 3.57 | 14.60 | 0 | 1 | 5 | 8 |
21.4 | 4 | 121 | 109 | 4.11 | 2.78 | 18.60 | 1 | 1 | 4 | 2 |
1.3. slice_sample()
The
mpg <dbl> |
cyl <dbl> |
disp <dbl> |
hp <dbl> |
drat <dbl> |
wt <dbl> |
qsec <dbl> |
vs <dbl> |
am <dbl> |
gear <dbl> |
carb <dbl> |
---|---|---|---|---|---|---|---|---|---|---|
15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
Data Dimensions
Another common task is understand the nature and dimensions of your dataset. This includes the number of columns and rows, the data types and even retrieving column headers. The following functions are useful in data dimension understanding.
dim()
The
dim(mtcars)
str()
The
str(mtcars)
summary()
The
names()
The
names(mtcars)