Train Test Split with Base R

There are a number of options to split data into training and testing sets for Model development. This note demonstrates it using base R

air_quality <- datasets::airquality
head(air_quality)
A data.frame: 6 × 5
OzoneSolar.RWindTempDay
411907.46751
361188.07252
1214912.67453
1831311.56254
NANA14.35655
28NA14.96656

Implementing train and test split

# sample 80% of all rows from air_quality dataset
train_indices <- sample(1:nrow(air_quality), size = 0.8 * nrow(air_quality))

# Split the data
train_data <- air_quality[train_indices, ]
test_data <- air_quality[-train_indices, ]

# Number of observations in each set
n_train <- nrow(train_data)
n_test <- nrow(test_data)

n_train  # Number of observations in the training set
n_test   # Number of observations in the test set
[1] 122 [1] 31