The preference_order dataframe returned by collinear() now shows the correct metric name instead of "custom" when a known f function is used.
The formula environment generated by model_formula() is now a minimal two-binding environment (poly, s) with baseenv() as parent, rather than the full caller frame. This prevents large objects (e.g. the df dataframe and intermediate variables) from being serialized when a fitted model is saved with saveRDS().
Improved example of the function collinear().
preference_order(): Namespace-qualified function names (e.g., collinear::f_count_gam) are now correctly recognised as package functions instead of being labelled metric = "custom". Fixed by stripping the namespace prefix before name lookup in R/preference_order.R.
preference_order(): NA values in the response variable (including those converted from Inf/-Inf/NaN) are now handled correctly. Fixed by wrapping the (y, x) data frame construction in stats::na.omit() in R/preference_order.R.
model_formula(): No longer crashes on sf spatial data frames with "default method not implemented for type 'list'". Fixed by calling drop_geometry_column() early in R/model_formula.R before variable-type detection.
score_auc(): No longer crashes when o or p contain NA. Fixed by removing incomplete cases before computing ones/zeros in R/score_auc.R.
vi, vi_smol, vi_predictors, vi_predictors_numeric, vi_predictors_categorical, vi_responses) have been moved to the package spatialData. All function examples and README code now load data via data(..., package = "spatialData").Argument response renamed to responses: Now accepts multiple response variables. Functions affected: collinear(), preference_order(), and related validation functions.
Argument encoding_method defaults to NULL in collinear(): Target encoding is now opt-in rather than automatic. Previously defaulted to "mean".
Default values changed for max_cor and max_vif: Both now default to NULL, triggering adaptive threshold computation based on the correlation structure of the data.
Output structure changed for collinear(): Now returns a list of class collinear_output containing sub-lists of class collinear_selection, each with response, df, preference_order, selection, and formulas slots. Previously returned a character vector or named list of character vectors.
| Old Name (v2.0) | New Name (v3.0) |
|-----------------|-----------------|
| identify_predictors() | Split into identify_valid_variables(), identify_numeric_variables(), identify_categorical_variables(), identify_logical_variables() |
| identify_predictors_categorical() | identify_categorical_variables() |
| identify_predictors_numeric() | identify_numeric_variables() |
| identify_predictors_zero_variance() | identify_zero_variance_variables() |
| identify_predictors_type() | Removed (merged into identify_valid_variables()) |
f_ Functions for Preference Order| Old Name (v2.0) | New Name (v3.0) |
|-----------------|-----------------|
| f_r2_glm_gaussian() | f_numeric_glm() |
| f_r2_gam_gaussian() | f_numeric_gam() |
| f_r2_rf() | f_numeric_rf() |
| f_r2_glm_poisson() | f_count_glm() |
| f_r2_gam_poisson() | f_count_gam() |
| f_auc_glm_binomial() | f_binomial_glm() |
| f_auc_gam_binomial() | f_binomial_gam() |
| f_auc_rf_binomial() | f_binomial_rf() |
| f_v_rf() | f_categorical_rf() |
| — | f_count_rf() (new) |
When both max_cor = NULL and max_vif = NULL, the function now automatically determines optimal filtering thresholds using:
gam_cor_to_vif) mapping correlation thresholds to equivalent VIF values.This data-driven approach adapts to each dataset's correlation structure, preventing over-filtering while maintaining statistically meaningful bounds.
step_collinear(): Recipe step for multicollinearity filtering in tidymodels workflows.prep() and bake() methods following recipes architecture.cv_training_fraction and cv_iterations in preference_order() and passed through collinear().collinear() now returns comprehensive results including:
S3 methods print() and summary() for collinear_output and collinear_selection classes provide clean output formatting.
cor_matrix() now returns signed correlations, preserving the positive semi-definite property required for VIF calculations.max_cor thresholds.collinear_stats(): Compute summary statistics for both correlation and VIF.cor_stats(): Summary statistics for pairwise correlations.vif_stats(): Summary statistics for variance inflation factors.f_count_rf(): Score integer count predictors with random forest.print.collinear_output()print.collinear_selection()summary.collinear_output()summary.collinear_selection()| Name | Description |
|------|-------------|
| experiment_adaptive_thresholds | Validation experiment results (10,000 iterations) |
| experiment_cor_vs_vif | Correlation vs VIF equivalence experiment results |
| gam_cor_to_vif | Fitted GAM for mapping max_cor to max_vif |
| prediction_cor_to_vif | Look-up table for threshold equivalence |
| toy | Simple dataset illustrating multicollinearity concepts |
| vi_smol | Smaller version of vi dataset (610 rows) for faster examples |
| vi_responses | Character vector of response variable names |
solve() to prevent false singularity detection.Inf.validate_arg_*() functions provide consistent argument checking across the package.@family tags for better cross-referencing.@inheritSection for consistent documentation of shared concepts.abs() before VIF computation."auto" in preference_order argument (ignored with message)Expanded Functionality: Functions collinear() and preference_order() support both categorical and numeric responses and predictors, and can handle several responses at once.
Robust Selection Algorithms: Enhanced selection in vif_select() and cor_select().
Enhanced Functionality to Rank Predictors: New functions to compute association between response and predictors covering most use-cases, and automated function selection depending on data features.
Simplified Target Encoding: Streamlined and parallelized for better efficiency, and new default is "loo" (leave-one-out).
Parallelization and Progress Bars: Utilizes future and progressr for enhanced performance and user experience.
collinear(), cor_select(), and vif_select()