A Guide to 21 Feature Importance Methods and Packages in Machine Learning (with Code) | by Theophano Mitsa | Dec, 2023


From the OmniXAI, Shapash, and Dalex interpretability packages to the Boruta, Relief, and Random Forest feature selection algorithms

Image created by the author at DALL-E

“We are our choices.” —Jean-Paul Sartre

We live in the era of artificial intelligence, mostly because of the incredible advancement of Large Language Models (LLMs). As important as it is for an ML engineer to learn about these new technologies, equally important is his/her ability to master the fundamental concepts of model selection, optimization, and deployment. Something else is very important: the input to the above, which consists of the data features. Data, like people, have characteristics called features. In the case of people, you must understand their unique characteristics to bring out the best in them. Well, the same principle applies to data. Specifically, this article is about feature importance, which measures the contribution of a feature to the predictive ability of a model. We have to understand feature importance for many essential reasons:

  • Time: Having too many features slows down the training model time and also model deployment. The latter is particularly important in edge applications (mobile, sensors, medical diagnostics).
  • Overfitting. If our features are not carefully selected, we might make our model overfit, i.e., learn about noise, too.
  • Curse of dimensionality. Many features mean many dimensions, and that makes data analysis exponentially more difficult. For example, k-NN classification, a widely used algorithm, is greatly affected by dimension increase.
  • Adaptability and transfer learning. This is my favorite reason and actually the reason for writing this article. In transfer learning, a model trained in one task can be used in a second task with some finetuning. Having a good understanding of your features in the first and second tasks can greatly reduce the fine-tuning you need to do.

We will focus on tabular data and discuss twenty-one ways to assess feature importance. One might wonder: ‘Why twenty-one techniques? Isn’t one enough?’ It is important to…



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*