Train Test Split with Base R
There are a number of options to split data into training and testing sets for Model development. This note demonstrates it using base R
air_quality <- datasets::airquality
head(air_quality)
Ozone | Solar.R | Wind | Temp | Day | |
---|---|---|---|---|---|
41 | 190 | 7.4 | 67 | 5 | 1 |
36 | 118 | 8.0 | 72 | 5 | 2 |
12 | 149 | 12.6 | 74 | 5 | 3 |
18 | 313 | 11.5 | 62 | 5 | 4 |
NA | NA | 14.3 | 56 | 5 | 5 |
28 | NA | 14.9 | 66 | 5 | 6 |
Implementing train and test split
# sample 80% of all rows from air_quality dataset
train_indices <- sample(1:nrow(air_quality), size = 0.8 * nrow(air_quality))
# Split the data
train_data <- air_quality[train_indices, ]
test_data <- air_quality[-train_indices, ]
# Number of observations in each set
n_train <- nrow(train_data)
n_test <- nrow(test_data)
n_train # Number of observations in the training set
n_test # Number of observations in the test set