Package: collinear 3.0.2

collinear: Automated Multicollinearity Management

Provides a comprehensive and automated workflow for managing multicollinearity in data frames with numeric and/or categorical variables. The package integrates five robust methods into a single function: (1) target encoding of categorical variables based on response values (Micci-Barreca, 2001 (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); (2) automated feature prioritization to preserve key predictors during filtering; (3 and 4) pairwise correlation and VIF filtering across all variable types (numeric–numeric, numeric–categorical, and categorical–categorical); (5) adaptive correlation and VIF thresholds. Together, these methods enable a reliable multicollinearity management in most use cases while maintaining model integrity. The package also supports parallel processing and progress tracking via the packages 'future' and 'progressr', and provides seamless integration with the 'tidymodels' ecosystem through a dedicated recipe step.

Authors:Blas M. Benito [aut, cre, cph]

collinear_3.0.2.tar.gz
collinear_3.0.2.zip(r-4.7)collinear_3.0.2.zip(r-4.6)collinear_3.0.2.zip(r-4.5)
collinear_3.0.2.tgz(r-4.6-any)collinear_3.0.2.tgz(r-4.5-any)
collinear_3.0.2.tar.gz(r-4.7-any)collinear_3.0.2.tar.gz(r-4.6-any)
collinear_3.0.2.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
collinear/json (API)

# Install 'collinear' in R:
install.packages('collinear', repos = c('https://blasbenito.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/blasbenito/collinear/issues

Pkgdown/docs site:https://blasbenito.github.io

Datasets:
  • experiment_adaptive_thresholds - Dataframe resulting from experiment to test the automatic selection of multicollinearity thresholds
  • experiment_cor_vs_vif - Dataframe with results of experiment comparing correlation and VIF thresholds
  • gam_cor_to_vif - GAM describing the relationship between correlation and VIF thresholds
  • prediction_cor_to_vif - Prediction of the model 'gam_cor_to_vif' across correlation values
  • toy - Toy dataframe with varying levels of multicollinearity

On CRAN:

Conda:

machine-learningmulticollinearitystatistics

5.81 score 18 stars 31 scripts 630 downloads 55 exports 77 dependencies

Last updated from:2614b706b0. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK214
source / vignettesOK203
linux-release-x86_64OK185
macos-release-arm64OK159
macos-oldrel-arm64OK125
windows-develOK147
windows-releaseOK195
windows-oldrelOK131
wasm-releaseOK172

Exports:case_weightscollinearcollinear_selectcollinear_statscor_clusterscor_cramercor_dfcor_matrixcor_selectcor_statsdrop_geometry_columnf_autof_auto_rulesf_binomial_gamf_binomial_glmf_binomial_rff_categorical_rff_count_gamf_count_glmf_count_rff_functionsf_numeric_gamf_numeric_glmf_numeric_rfidentify_categorical_variablesidentify_logical_variablesidentify_numeric_variablesidentify_response_typeidentify_valid_variablesidentify_zero_variance_variablesmodel_formulapreference_orderscore_aucscore_cramerscore_r2step_collineartarget_encoding_labtarget_encoding_lootarget_encoding_meantarget_encoding_rankvalidate_arg_dfvalidate_arg_df_not_nullvalidate_arg_encoding_methodvalidate_arg_fvalidate_arg_function_namevalidate_arg_max_corvalidate_arg_max_vifvalidate_arg_predictorsvalidate_arg_preference_ordervalidate_arg_quietvalidate_arg_responsesvifvif_dfvif_selectvif_stats

Dependencies:classclassIntcliclockcodetoolscpp11data.tableDBIdiagramdigestdplyre1071farverfuturefuture.applygenericsggplot2globalsgluegowergtablehardhatipredisobandKernSmoothlabelinglatticelavalifecyclelistenvlubridatemagrittrMASSMatrixmgcvnlmennetnumDerivparallellypillarpkgconfigprodlimprogressrproxypurrrR6rangerRColorBrewerRcppRcppEigenrecipesrlangrparts2S7scalessfshapesparsevctrsspatialDataSQUAREMstringistringrsurvivalterratibbletidyrtidyselecttimechangetimeDatetzdbunitsutf8vctrsviridisLitewithrwk

Readme and manuals

Help Manual

Help pageTopics
Generate sample weights for imbalanced responsescase_weights
Smart multicollinearity managementcollinear
Dual multicollinearity filtering algorithmcollinear_select
Compute summary statistics for correlation and VIFcollinear_stats
Group predictors by hierarchical correlation clusteringcor_clusters
Quantify association between categorical variablescor_cramer
Compute signed pairwise correlations dataframecor_df
Signed pairwise correlation matrixcor_matrix
Multicollinearity filtering by pairwise correlation thresholdcor_select
Compute summary statistics for absolute pairwise correlationscor_stats
Removes 'geometry' Column From 'sf' Dataframesdrop_geometry_column
Dataframe resulting from experiment to test the automatic selection of multicollinearity thresholdsexperiment_adaptive_thresholds
Dataframe with results of experiment comparing correlation and VIF thresholdsexperiment_cor_vs_vif
Automatic selection of predictor scoring methodf_auto
Decision rules for 'f_auto()'f_auto_rules
Area under the curve of binomial GAM predictions vs. observationsf_binomial_gam
Area Under the Curve of Binomial GLM predictions vs. observationsf_binomial_glm
Area Under the Curve of Binomial Random Forest predictions vs. observationsf_binomial_rf
Cramer's V of Categorical Random Forest predictions vs. observationsf_categorical_rf
R-squared of Poisson GAM predictions vs. observationsf_count_gam
R-squared of Poisson GLM predictions vs. observationsf_count_glm
R-squared of Random Forest predictions vs. observationsf_count_rf
List predictor scoring functionsf_functions
R-squared of Gaussian GAM predictions vs. observationsf_numeric_gam
R-squared of Gaussian GLM predictions vs. observationsf_numeric_glm
R-squared of Random Forest predictions vs. observationsf_numeric_rf
GAM describing the relationship between correlation and VIF thresholdsgam_cor_to_vif
Find valid categorical variables in a dataframeidentify_categorical_variables
Find logical variables in a dataframeidentify_logical_variables
Find valid numeric variables in a dataframeidentify_numeric_variables
Detect response variable type for model selectionidentify_response_type
Find valid numeric, categorical, and logical variables in a dataframeidentify_valid_variables
Find near-zero variance variables in a dataframeidentify_zero_variance_variables
Build model formulas from response and predictorsmodel_formula
Prediction of the model 'gam_cor_to_vif' across correlation valuesprediction_cor_to_vif
Rank predictors by importance or multicollinearitypreference_order
Print all collinear selection results of 'collinear()'print.collinear_output
Print single selection results from 'collinear'print.collinear_selection
Compute area under the ROC curve between binomial observations and probabilistic predictionsscore_auc
Compute Cramer's V between categorical observations and predictionsscore_cramer
Compute R-squared between numeric observations and predictionsscore_r2
Tidymodels recipe step for multicollinearity filteringbake.step_collinear prep.step_collinear step_collinear
Summarize all results of 'collinear()'summary.collinear_output
Summarize single response selection results of 'collinear'summary.collinear_selection
Convert categorical predictors to numeric via target encodingtarget_encoding_lab
Encode categories as response meanstarget_encoding_loo target_encoding_mean target_encoding_rank
Toy dataframe with varying levels of multicollinearitytoy
Check and prepare argument 'df'validate_arg_df
Ensure that argument 'df' is not 'NULL'validate_arg_df_not_null
Check and validate argument 'encoding_method'validate_arg_encoding_method
Check and validate argument 'f'validate_arg_f
Build hierarchical function names for messagesvalidate_arg_function_name
Check and constrain argument 'max_cor'validate_arg_max_cor
Check and constrain argument 'max_vif'validate_arg_max_vif
Check and validate argument 'predictors'validate_arg_predictors
Check and complete argument 'preference_order'validate_arg_preference_order
Check and validate argument 'quiet'validate_arg_quiet
Check and validate arguments 'response' and 'responses'validate_arg_responses
Compute variance inflation factors from a correlation matrixvif
Compute variance inflation factors dataframevif_df
Multicollinearity filtering by variance inflation factor thresholdvif_select
VIF Statisticsvif_stats