Skip to content
Learni
View all tutorials
Data Science

How to Implement ML Pipelines with tidymodels in R in 2026

Lire en français

Introduction

In 2026, tidymodels is the go-to framework for machine learning in R, unifying preprocessing, modeling, and evaluation in a consistent tidyverse-inspired ecosystem. Unlike fragmented approaches like caret, tidymodels delivers reproducible and scalable workflows, perfect for expert data scientists tackling massive datasets. This advanced tutorial guides you step-by-step through building a full classification pipeline on the iris dataset: preprocessing with recipes, random forests via parsnip, hyperparameter tuning with tune, and robust evaluation. You'll end up with an optimized model ready for deployment, complete with precise metrics like AUC-ROC. Why does it matter? Tidymodels pipelines prevent data leakage, speed up iteration, and integrate seamlessly with tools like vetiver for serving. By the end, you'll master pro techniques to boost your ML performance by 20-30%.

Prerequisites

  • R version 4.3+ installed
  • Advanced knowledge of tidyverse and statistics
  • Iris dataset (built-in)
  • RStudio or VS Code with R extension
  • 30 minutes to run all the code

Installing tidymodels Packages

01_install_tidymodels_packages.R
if (!require("tidymodels", quietly = TRUE)) {
  install.packages("tidymodels")
}
library(tidymodels)
library(tidyverse)

# Check installation
tidymodels:::version()

This code installs and loads tidymodels, which includes parsnip, recipes, tune, and dials for a unified ecosystem. Use require to avoid unnecessary restarts. Tip: Include quietly = TRUE to suppress verbose warnings in reproducible tutorials.

Data Preparation

We'll use the iris dataset for multi-class classification (Setosa, Versicolor, Virginica). Tidymodels shines with initial train/validation/test splits to avoid evaluation bias. Think of it like a conveyor belt: raw data goes in, clean features come out.

Loading and Splitting the Data

02_load_split_data.R
data(iris)
set.seed(123)
split <- initial_split(iris, prop = 0.8, strata = Species)
train_data <- training(split)
test_data  <- testing(split)

# Check stratification
cat("Train:", nrow(train_data), "| Test:", nrow(test_data), "\n")
table(train_data$Species)

This block loads iris and performs a stratified 80/20 split with initial_split to preserve class proportions. set.seed(123) ensures reproducibility. Avoid unseeded random splits that ruin comparative benchmarks.

Creating the Preprocessing Recipe

03_create_recipe.R
iris_recipe <- recipe(Species ~ ., data = train_data) %>%
  step_normalize(all_predictors()) %>%
  step_dummy(all_nominal_predictors()) %>%  # Not needed here but for example
  step_zv(all_predictors()) %>%
  step_corr(all_predictors(), threshold = 0.9)

iris_recipe <- prep(iris_recipe, training = train_data)

# Preview baked data
baked_train <- bake(iris_recipe, new_data = train_data)
baked_test  <- bake(iris_recipe, new_data = test_data)
head(baked_train)

The recipe normalizes (z-score), removes zero-variance variables and correlations >0.9, producing clean features. prep() fits on train only, bake() applies it. Major pitfall: Never fit on test data to avoid leakage.

Model Specification

Parsnip abstracts engines (randomForest, xgboost, etc.). We'll tune a random forest to capture nonlinear interactions, like an orchestra where each tree votes intelligently.

Defining the Model and Workflow

04_model_workflow.R
rf_model <- rand_forest(trees = tune(), mtry = tune()) %>%
  set_engine("ranger", importance = "permutation") %>%
  set_mode("classification")

wf <- workflow() %>%
  add_recipe(iris_recipe) %>%
  add_model(rf_model)

wf

The random forest model uses tune() for hyperparameters (trees, mtry). ranger engine for speed and importance. The workflow chains recipe + model for a modular, testable pipeline.

Hyperparameter Tuning with Grid Search

05_tune_grid.R
folds <- vfold_cv(train_data, v = 5)

param_grid <- grid_regular(
  trees(range = c(100, 500)),
  mtry(range = c(1, 4)),
  levels = 3
)

tune_res <- tune_grid(
  wf,
  resamples = folds,
  grid = param_grid,
  metrics = metric_set(roc_auc, accuracy)
)

autoplot(tune_res)

5-fold cross-validation on train data with a grid of 9 hyperparameter combinations via dials. Evaluates AUC-ROC and accuracy. autoplot visualizes results; select the best with select_best(tune_res, "roc_auc") to avoid overfitting.

Finalization and Evaluation

After tuning, finalize the workflow and predict on test data for realistic metrics. Tidymodels handles everything seamlessly.

Final Fit and Predictions

06_final_fit_predict.R
best_params <- select_best(tune_res, metric = "roc_auc")
final_wf <- finalize_workflow(wf, best_params)

final_fit <- fit(final_wf, data = train_data)

predictions <- predict(final_fit, test_data) %>%
  bind_cols(test_data %>% select(Species))

# Final metrics
final_metrics <- predictions %>%
  conf_mat(truth = Species, estimate = .pred_class) %>%
  summary(roc_auc)
final_metrics

Finalizes with best parameters, fits on full train data, predicts on test. conf_mat and summary(roc_auc) provide confusion matrix and AUC. Tip: Always evaluate on held-out test for true generalization estimates.

Feature Importance and Export

07_feature_importance_export.R
# Feature importance
final_fit %>%
  extract_fit_parsnip() %>%
  vip::vip(num_features = 4)

# Save the model
saveRDS(final_fit, "iris_rf_model.rds")

# Future loading
# loaded_model <- readRDS("iris_rf_model.rds")
# predict(loaded_model, new_data = newdata)

Extracts importance via vip (install if needed: install.packages('vip')). Saves with saveRDS for deployment. Ideal for CI/CD pipelines; load and predict without refitting.

Best Practices

  • Always stratify splits and folds for imbalanced classes.
  • Use metric_set() to evaluate multiple metrics at once.
  • Integrate themis for upsampling in imbalance cases.
  • Version workflows with renv or packrat for reproducibility.
  • Deploy via vetiver for scalable REST APIs.

Common Errors to Avoid

  • Preprocessing the full dataset before splitting (fatal data leakage).
  • Skipping tuning: defaults often underperform by 10-15%.
  • Forgetting set.seed(): irreproducible results.
  • Evaluating only on train (optimistic bias).

Next Steps

Dive into Learni trainings on R and ML. Check the tidymodels docs: tidymodels.org. Try xgboost or neuralnets for regression. Integrate with Plumber for APIs.