Package: collinear 1.1.1

collinear: Seamless Multicollinearity Management

System for seamless management of multicollinearity in data frames with numeric and/or categorical variables for statistical analysis and machine learning modeling. The package combines bivariate correlation (Pearson, Spearman, and Cramer's V) with variance inflation factor analysis, target encoding to transform categorical variables into numeric (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>), and a flexible feature prioritization method, to deliver a comprehensive multicollinearity management tool covering a wide range of use cases.

Authors:Blas M. Benito [aut, cre, cph]

collinear_1.1.1.tar.gz
collinear_1.1.1.zip(r-4.5)collinear_1.1.1.zip(r-4.4)collinear_1.1.1.zip(r-4.3)
collinear_1.1.1.tgz(r-4.4-any)collinear_1.1.1.tgz(r-4.3-any)
collinear_1.1.1.tar.gz(r-4.5-noble)collinear_1.1.1.tar.gz(r-4.4-noble)
collinear_1.1.1.tgz(r-4.4-emscripten)collinear_1.1.1.tgz(r-4.3-emscripten)
collinear.pdf |collinear.html
collinear/json (API)
NEWS

# Install 'collinear' in R:
install.packages('collinear', repos = c('https://blasbenito.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/blasbenito/collinear/issues

Datasets:
  • toy - One response and four predictors with varying levels of multicollinearity
  • vi - 30.000 records of responses and predictors all over the world
  • vi_predictors - Predictor names in data frame 'vi'

On CRAN:

machine-learningmulticollinearitystatistics

33 exports 7 stars 2.02 score 16 dependencies 41 scripts 305 downloads

Last updated 3 days agofrom:e42e3321ec. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 15 2024
R-4.5-winNOTESep 15 2024
R-4.5-linuxNOTESep 15 2024
R-4.4-winNOTESep 15 2024
R-4.4-macNOTESep 15 2024
R-4.3-winNOTESep 15 2024
R-4.3-macNOTESep 15 2024

Exports:add_white_noiseauc_scorecase_weightscollinearcor_dfcor_matrixcor_selectcramer_vdrop_geometry_columnf_gam_auc_balancedf_gam_auc_unbalancedf_gam_deviancef_logistic_auc_balancedf_logistic_auc_unbalancedf_rf_auc_balancedf_rf_auc_unbalancedf_rf_deviancef_rf_rsquaredf_rsquaredidentify_non_numeric_predictorsidentify_numeric_predictorsidentify_zero_variance_predictorspreference_ordertarget_encoding_labtarget_encoding_lootarget_encoding_meantarget_encoding_ranktarget_encoding_rnormvalidate_dfvalidate_predictorsvalidate_responsevif_dfvif_select

Dependencies:clidplyrfansigenericsgluelifecyclemagrittrpillarpkgconfigR6rlangtibbletidyselectutf8vctrswithr

Readme and manuals

Help Manual

Help pageTopics
Area Under the Receiver Operating Characteristicauc_score
Automated multicollinearity managementcollinear
Correlation data frame of numeric and character variablescor_df
Correlation matrix of numeric and character variablescor_matrix
Automated multicollinearity reduction via pairwise correlationcor_select
Bias Corrected Cramer's Vcramer_v
AUC of Logistic GAM Modelf_gam_auc_balanced
AUC of Logistic GAM Model with Weighted Casesf_gam_auc_unbalanced
Explained Deviance from univariate GAM modelf_gam_deviance
AUC of Binomial GLM with Logit Linkf_logistic_auc_balanced
AUC of Binomial GLM with Logit Link and Case Weightsf_logistic_auc_unbalanced
AUC of Random Forest model of a balanced binary responsef_rf_auc_balanced
AUC of Random Forest model of an unbalanced binary responsef_rf_auc_unbalanced
R-squared of Random Forest modelf_rf_deviance f_rf_rsquared
R-squared between a response and a predictorf_rsquared
Identify non-numeric predictorsidentify_non_numeric_predictors
Identify numeric predictorsidentify_numeric_predictors
Identify zero and near-zero-variance predictorsidentify_zero_variance_predictors
Compute the preference order for predictors based on a user-defined function.preference_order
Target encoding of non-numeric variablestarget_encoding_lab
Target-encoding methodsadd_white_noise target_encoding_loo target_encoding_mean target_encoding_rank target_encoding_rnorm
One response and four predictors with varying levels of multicollinearitytoy
Validate input data framevalidate_df
Validate the 'predictors' argument for analysisvalidate_predictors
Validate the 'response' argument for target encoding of non-numeric variablesvalidate_response
30.000 records of responses and predictors all over the worldvi
Predictor names in data frame 'vi'vi_predictors
Variance Inflation Factorvif_df
Automated multicollinearity reduction via Variance Inflation Factorvif_select