How can you find the most important features in your dataset? There’s a ton of techniques, and this article will teach you three any data scientist should know.

After reading, you’ll know how to calculate feature importance in Python with only a couple of lines of code. You’ll also learn the prerequisites of these techniques – crucial to making them work properly.

You can download the Notebook for this article here.

The article is structured as follows:

- Dataset loading and preparation
- Method #1 – Obtain importances from coefficients
- Method #2 – Obtain importances from a tree-based model
- Method #3 – Obtain importances from PCA loading scores
- Conclusion

## Dataset loading and preparation

Let’s spend as little time as possible here. You’ll use the *Breast cancer* dataset, which is built into Scikit-Learn. You’ll also need Numpy, Pandas, and Matplotlib for various analysis and visualization purposes.

The following snippet shows you how to import the libraries and load the dataset:

The dataset isn’t in the most convenient format now. You’ll work with Pandas data frames most of the time, so let’s quickly convert it into one. The following snippet concatenates predictors and the target variable into a single data frame:

Calling `head()`

results in the following output:

In a nutshell, there are 30 predictors and a single target variable. All of the values are numeric, and there are no missing values. The only obvious problem is the scale. Just take a look at the* mean area* and *mean smoothness* columns – the differences are drastic, which could result in poor models.

You’ll also need to perform a train/test split before addressing the scaling issue.

The following snippet shows you how to make a train/test split and scale the predictors with the `StandardScaler`

class:

And that’s all you need to start obtaining feature importances. Let’s do that next.

## Method #1 – Obtain importances from coefficients

Probably the easiest way to examine feature importances is by examining the model’s coefficients. For example, both linear and logistic regression boils down to an equation in which coefficients (importances) are assigned to each input value.

Put simply, if an assigned coefficient is a large (negative or positive) number, it has some influence on the prediction. On the contrary, if the coefficient is zero, it doesn’t have any impact on the prediction.

Simple logic, but let’s put it to the test. We have a classification dataset, so **logistic regression** is an appropriate algorithm. After the model is fitted, the coefficients are stored in the `coef_`

property.

The following snippet trains the logistic regression model, creates a data frame in which the attributes are stored with their respective coefficients, and sorts that data frame by the coefficient in descending order:

*That was easy, wasn’t it?* Let’s examine the coefficients visually next. The following snippet makes a bar chart from coefficients:

Here’s the corresponding visualization:

And that’s all there is to this simple technique. A take-home point is that the larger the coefficient is (in both positive and negative direction), the more influence it has on a prediction.

## Method #2 – Obtain importances from a tree-based model

After training any tree-based models, you’ll have access to the `feature_importances_`

property. It’s one of the fastest ways you can obtain feature importances.

The following snippet shows you how to import and fit the `XGBClassifier`

model on the training data. The importances are obtained similarly as before – stored to a data frame which is then sorted by the importance:

You can examine the importance visually by plotting a bar chart. Here’s how to make one:

The corresponding visualization is shown below:

As mentioned earlier, obtaining importances in this way is effortless, but the results can come up a bit **biased**. The tendency of this approach is to inflate the importance of continuous features or high-cardinality categorical variables[1]. Make sure to do the proper preparation and transformations first, and you should be good to go.

## Method #3 – Obtain importances from PCA loading scores

Principal Component Analysis (PCA) is a fantastic technique for dimensionality reduction, and can also be used to determine feature importance.

PCA won’t show you the most important features directly, as the previous two techniques did. Instead, it will return N principal components, where N equals the number of original features.

If you’re a bit rusty on PCA, there’s a complete from-scratch guide at the end of this article.

To start, let’s fit PCA to our scaled data and see what happens. The following snippet does just that and also plots a line plot of the cumulative explained variance:

Here’s the corresponding visualization:

*But what does this mean?* It means you can explain 90-ish% of the variance in your source dataset with the first five principal components. Again, refer to the from-scratch guide if you don’t know what this means.

You can now start dealing with **PCA loadings**. These are just coefficients of the linear combination of the original variables from which the principal components are constructed[2]. You can use loadings to find correlations between actual variables and principal components.

If there’s a strong correlation between the principal component and the original variable, it means this feature is important – to say with the simplest words.

Here’s the snippet for computing loading scores with Python:

The corresponding data frame looks like this:

The first principal component is crucial. It’s just a single feature, but it explains over 60% of the variance in the dataset. As you can see from *Image 5,* the correlation coefficient between it and the mean radius feature is almost 0.8 – which is considered a strong positive correlation.

Let’s visualize the correlations between all of the input features and the first principal components. Here’s the entire code snippet (visualization included):

The corresponding visualization is shown below:

And that’s how you can “hack” PCA to use it as a feature importance algorithm. Let’s wrap things up in the next section.

## Conclusion

And there you have it – three techniques you can use to find out what matters. Of course, there are many others, and you can find some of them in the *Learn more* section of this article.

These three should suit you well for any machine learning task. Just make sure to do the proper cleaning, exploration, and preparation first.

Thanks for reading.

#### Join my private email list for more helpful insights.

## Learn more

- Top 5 Books to Learn Data Science in 2021
- Principal Component Analysis (PCA) from scratch in Python
- Feature Selection in Python – Recursive Feature Elimination
- Attribute Relevance Analysis in Python – IV and WoE

## References

- https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e
- https://scentellegher.github.io/machine-learning/2020/01/27/pca-loadings-sklearn.html