Changes in version 3.0.2 (2026-06-04)                  

The preference_order dataframe returned by collinear() now shows the
correct metric name instead of "custom" when a known f function is used.

The formula environment generated by model_formula() is now a minimal
two-binding environment (poly, s) with baseenv() as parent, rather than
the full caller frame. This prevents large objects (e.g. the df
dataframe and intermediate variables) from being serialized when a
fitted model is saved with saveRDS().

Improved example of the function collinear().

                 Changes in version 3.0.1 (2026-05-07)                  

Bug Fixes

  - preference_order(): Namespace-qualified function names (e.g.,
    collinear::f_count_gam) are now correctly recognised as package
    functions instead of being labelled metric = "custom". Fixed by
    stripping the namespace prefix before name lookup in
    R/preference_order.R.

  - preference_order(): NA values in the response variable (including
    those converted from Inf/-Inf/NaN) are now handled correctly. Fixed
    by wrapping the (y, x) data frame construction in stats::na.omit()
    in R/preference_order.R.

  - model_formula(): No longer crashes on sf spatial data frames with
    "default method not implemented for type 'list'". Fixed by calling
    drop_geometry_column() early in R/model_formula.R before
    variable-type detection.

  - score_auc(): No longer crashes when o or p contain NA. Fixed by
    removing incomplete cases before computing ones/zeros in
    R/score_auc.R.

Data Changes

  - Example datasets (vi, vi_smol, vi_predictors, vi_predictors_numeric,
    vi_predictors_categorical, vi_responses) have been moved to the
    package spatialData. All function examples and README code now load
    data via data(..., package = "spatialData").

Other Changes

  - The package now prints its version on attach.

                 Changes in version 3.0.0 (2025-12-08)                  

Breaking Changes

API Changes

  - Argument response renamed to responses: Now accepts multiple
    response variables. Functions affected: collinear(),
    preference_order(), and related validation functions.

  - Argument encoding_method defaults to NULL in collinear(): Target
    encoding is now opt-in rather than automatic. Previously defaulted
    to "mean".

  - Default values changed for max_cor and max_vif: Both now default to
    NULL, triggering adaptive threshold computation based on the
    correlation structure of the data.

  - Output structure changed for collinear(): Now returns a list of
    class collinear_output containing sub-lists of class
    collinear_selection, each with response, df, preference_order,
    selection, and formulas slots. Previously returned a character
    vector or named list of character vectors.

Renamed Functions

| Old Name (v2.0) | New Name (v3.0) |
|-----------------|-----------------| | identify_predictors() | Split
into identify_valid_variables(), identify_numeric_variables(),
identify_categorical_variables(), identify_logical_variables() | |
identify_predictors_categorical() | identify_categorical_variables() | |
identify_predictors_numeric() | identify_numeric_variables() | |
identify_predictors_zero_variance() | identify_zero_variance_variables()
| | identify_predictors_type() | Removed (merged into
identify_valid_variables()) |

Renamed f_ Functions for Preference Order

| Old Name (v2.0) | New Name (v3.0) |
|-----------------|-----------------| | f_r2_glm_gaussian() |
f_numeric_glm() | | f_r2_gam_gaussian() | f_numeric_gam() | | f_r2_rf()
| f_numeric_rf() | | f_r2_glm_poisson() | f_count_glm() | |
f_r2_gam_poisson() | f_count_gam() | | f_auc_glm_binomial() |
f_binomial_glm() | | f_auc_gam_binomial() | f_binomial_gam() | |
f_auc_rf_binomial() | f_binomial_rf() | | f_v_rf() | f_categorical_rf()
| | — | f_count_rf() (new) |

Major New Features

Adaptive Multicollinearity Thresholds

When both max_cor = NULL and max_vif = NULL, the function now
automatically determines optimal filtering thresholds using:

  - The 75th percentile of pairwise correlations as input
  - A sigmoid transformation that smoothly transitions between.
    conservative (VIF ≈ 2.5) and permissive (VIF ≈ 7.5) thresholds.
  - A GAM model (gam_cor_to_vif) mapping correlation thresholds to
    equivalent VIF values.

This data-driven approach adapts to each dataset's correlation
structure, preventing over-filtering while maintaining statistically
meaningful bounds.

Tidymodels Integration

  - New step_collinear(): Recipe step for multicollinearity filtering in
    tidymodels workflows.
  - Implements proper prep() and bake() methods following recipes
    architecture.

Cross-Validation Support in Preference Order

  - New arguments cv_training_fraction and cv_iterations in
    preference_order() and passed through collinear().
  - Enables robust predictor ranking through repeated train/test splits.

Rich Output Structure

collinear() now returns comprehensive results including:

  - Filtered dataframe with response and selected predictors.
  - Preference order dataframe with rankings.
  - Ready-to-use model formulas (linear, smooth/GAM, classification).

S3 methods print() and summary() for collinear_output and
collinear_selection classes provide clean output formatting.

Correlation Matrix Improvements

  - cor_matrix() now returns signed correlations, preserving the
    positive semi-definite property required for VIF calculations.
  - Absolute values applied only when comparing against max_cor
    thresholds.
  - Fixes numerical instability that could produce negative VIF scores.

New Functions

Multicollinearity Assessment

  - collinear_stats(): Compute summary statistics for both correlation
    and VIF.
  - cor_stats(): Summary statistics for pairwise correlations.
  - vif_stats(): Summary statistics for variance inflation factors.

Preference Order

  - f_count_rf(): Score integer count predictors with random forest.

S3 Methods

  - print.collinear_output()
  - print.collinear_selection()
  - summary.collinear_output()
  - summary.collinear_selection()

New Datasets and Models

| Name | Description | |------|-------------| |
experiment_adaptive_thresholds | Validation experiment results (10,000
iterations) | | experiment_cor_vs_vif | Correlation vs VIF equivalence
experiment results | | gam_cor_to_vif | Fitted GAM for mapping max_cor
to max_vif | | prediction_cor_to_vif | Look-up table for threshold
equivalence | | toy | Simple dataset illustrating multicollinearity
concepts | | vi_smol | Smaller version of vi dataset (610 rows) for
faster examples | | vi_responses | Character vector of response variable
names |

Improvements

VIF Computation

  - Ridge regularization fallback for near-singular matrices.
  - Improved tolerance calculation for solve() to prevent false
    singularity detection.
  - VIF values exceeding 1M are now capped to Inf.

Validation

  - New validate_arg_*() functions provide consistent argument checking
    across the package.
  - Hierarchical function name tracking for clearer error messages.

Documentation

  - Comprehensive roxygen documentation with working examples.
  - @family tags for better cross-referencing.
  - @inheritSection for consistent documentation of shared concepts.

Bug Fixes

  - Fixed correlation matrix handling that destroyed positive
    semi-definite property when applying abs() before VIF computation.
  - Fixed edge cases in VIF computation for ill-conditioned matrices.
  - Proper handling of single-predictor cases across all functions.

Deprecated

  - Value "auto" in preference_order argument (ignored with message)

                 Changes in version 2.0.0 (2024-11-08)                  

Main Improvements

1.  Expanded Functionality: Functions collinear() and preference_order()
    support both categorical and numeric responses and predictors, and
    can handle several responses at once.

2.  Robust Selection Algorithms: Enhanced selection in vif_select() and
    cor_select().

3.  Enhanced Functionality to Rank Predictors: New functions to compute
    association between response and predictors covering most use-cases,
    and automated function selection depending on data features.

4.  Simplified Target Encoding: Streamlined and parallelized for better
    efficiency, and new default is "loo" (leave-one-out).

5.  Parallelization and Progress Bars: Utilizes future and progressr for
    enhanced performance and user experience.

                 Changes in version 1.1.1 (2023-12-08)                  

  - Initial CRAN release
  - Basic multicollinearity filtering with collinear(), cor_select(),
    and vif_select()
  - Target encoding methods: mean, rank, leave-one-out
  - Preference order functionality
  - Support for mixed numeric and categorical predictors