Introduction
In 2026, tidymodels is the go-to framework for machine learning in R, unifying preprocessing, modeling, and evaluation in a consistent tidyverse-inspired ecosystem. Unlike fragmented approaches like caret, tidymodels delivers reproducible and scalable workflows, perfect for expert data scientists tackling massive datasets. This advanced tutorial guides you step-by-step through building a full classification pipeline on the iris dataset: preprocessing with recipes, random forests via parsnip, hyperparameter tuning with tune, and robust evaluation. You'll end up with an optimized model ready for deployment, complete with precise metrics like AUC-ROC. Why does it matter? Tidymodels pipelines prevent data leakage, speed up iteration, and integrate seamlessly with tools like vetiver for serving. By the end, you'll master pro techniques to boost your ML performance by 20-30%.
Prerequisites
- R version 4.3+ installed
- Advanced knowledge of tidyverse and statistics
- Iris dataset (built-in)
- RStudio or VS Code with R extension
- 30 minutes to run all the code
Installing tidymodels Packages
if (!require("tidymodels", quietly = TRUE)) {
install.packages("tidymodels")
}
library(tidymodels)
library(tidyverse)
# Check installation
tidymodels:::version()This code installs and loads tidymodels, which includes parsnip, recipes, tune, and dials for a unified ecosystem. Use require to avoid unnecessary restarts. Tip: Include quietly = TRUE to suppress verbose warnings in reproducible tutorials.
Data Preparation
We'll use the iris dataset for multi-class classification (Setosa, Versicolor, Virginica). Tidymodels shines with initial train/validation/test splits to avoid evaluation bias. Think of it like a conveyor belt: raw data goes in, clean features come out.
Loading and Splitting the Data
data(iris)
set.seed(123)
split <- initial_split(iris, prop = 0.8, strata = Species)
train_data <- training(split)
test_data <- testing(split)
# Check stratification
cat("Train:", nrow(train_data), "| Test:", nrow(test_data), "\n")
table(train_data$Species)This block loads iris and performs a stratified 80/20 split with initial_split to preserve class proportions. set.seed(123) ensures reproducibility. Avoid unseeded random splits that ruin comparative benchmarks.
Creating the Preprocessing Recipe
iris_recipe <- recipe(Species ~ ., data = train_data) %>%
step_normalize(all_predictors()) %>%
step_dummy(all_nominal_predictors()) %>% # Not needed here but for example
step_zv(all_predictors()) %>%
step_corr(all_predictors(), threshold = 0.9)
iris_recipe <- prep(iris_recipe, training = train_data)
# Preview baked data
baked_train <- bake(iris_recipe, new_data = train_data)
baked_test <- bake(iris_recipe, new_data = test_data)
head(baked_train)The recipe normalizes (z-score), removes zero-variance variables and correlations >0.9, producing clean features. prep() fits on train only, bake() applies it. Major pitfall: Never fit on test data to avoid leakage.
Model Specification
Parsnip abstracts engines (randomForest, xgboost, etc.). We'll tune a random forest to capture nonlinear interactions, like an orchestra where each tree votes intelligently.
Defining the Model and Workflow
rf_model <- rand_forest(trees = tune(), mtry = tune()) %>%
set_engine("ranger", importance = "permutation") %>%
set_mode("classification")
wf <- workflow() %>%
add_recipe(iris_recipe) %>%
add_model(rf_model)
wfThe random forest model uses tune() for hyperparameters (trees, mtry). ranger engine for speed and importance. The workflow chains recipe + model for a modular, testable pipeline.
Hyperparameter Tuning with Grid Search
folds <- vfold_cv(train_data, v = 5)
param_grid <- grid_regular(
trees(range = c(100, 500)),
mtry(range = c(1, 4)),
levels = 3
)
tune_res <- tune_grid(
wf,
resamples = folds,
grid = param_grid,
metrics = metric_set(roc_auc, accuracy)
)
autoplot(tune_res)5-fold cross-validation on train data with a grid of 9 hyperparameter combinations via dials. Evaluates AUC-ROC and accuracy. autoplot visualizes results; select the best with select_best(tune_res, "roc_auc") to avoid overfitting.
Finalization and Evaluation
After tuning, finalize the workflow and predict on test data for realistic metrics. Tidymodels handles everything seamlessly.
Final Fit and Predictions
best_params <- select_best(tune_res, metric = "roc_auc")
final_wf <- finalize_workflow(wf, best_params)
final_fit <- fit(final_wf, data = train_data)
predictions <- predict(final_fit, test_data) %>%
bind_cols(test_data %>% select(Species))
# Final metrics
final_metrics <- predictions %>%
conf_mat(truth = Species, estimate = .pred_class) %>%
summary(roc_auc)
final_metricsFinalizes with best parameters, fits on full train data, predicts on test. conf_mat and summary(roc_auc) provide confusion matrix and AUC. Tip: Always evaluate on held-out test for true generalization estimates.
Feature Importance and Export
# Feature importance
final_fit %>%
extract_fit_parsnip() %>%
vip::vip(num_features = 4)
# Save the model
saveRDS(final_fit, "iris_rf_model.rds")
# Future loading
# loaded_model <- readRDS("iris_rf_model.rds")
# predict(loaded_model, new_data = newdata)Extracts importance via vip (install if needed: install.packages('vip')). Saves with saveRDS for deployment. Ideal for CI/CD pipelines; load and predict without refitting.
Best Practices
- Always stratify splits and folds for imbalanced classes.
- Use
metric_set()to evaluate multiple metrics at once. - Integrate
themisfor upsampling in imbalance cases. - Version workflows with renv or packrat for reproducibility.
- Deploy via vetiver for scalable REST APIs.
Common Errors to Avoid
- Preprocessing the full dataset before splitting (fatal data leakage).
- Skipping tuning: defaults often underperform by 10-15%.
- Forgetting
set.seed(): irreproducible results. - Evaluating only on train (optimistic bias).
Next Steps
Dive into Learni trainings on R and ML. Check the tidymodels docs: tidymodels.org. Try xgboost or neuralnets for regression. Integrate with Plumber for APIs.