Permutation

9/5/2023

Furthermore, PIMP was used to correct RF-based importance measures for two real-world case studies.

We apply our method to simulated data and demonstrate that (i) non-informative predictors do not receive significant P-values, (ii) informative variables can successfully be recovered among non-informative variables and (iii) P-values computed with permutation importance (PIMP) are very helpful for deciding the significance of variables, and therefore improve model interpretability. The P-value of the observed importance provides a corrected measure of feature importance. The method is based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting. Results: In this work, we introduce a heuristic for normalizing feature importance measures that can correct the feature importance bias. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models.

Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. Motivation: In life sciences, interpretability of machine learning models is as important as their prediction accuracy.

0 Comments

Permutation

Leave a Reply.

Author

Archives

Categories