This article was originally published on Towards Data Science on March 19th, 2020.
As one of the most popular Python libraries for scientific computing, Numpy certainly doesn’t have a lack of useful and interesting functions worth exploring in depth. Some of the most interesting, but lesser-known will be explored in this article in a hope it will encourage you to explore further on your own.
As a Data scientist, I use the Numpy library on a daily basis, mostly for handling arrays throughout all sorts of operations — multiplication, concatenation, reshaping, comparison, etc.
Here are the functions that the article will cover:
Imports-wise you’ll only need Numpy:
So without further ado, let’s get started!
How many times have you looked at the entries in a DataFrame and seen the value of 0, and then decided to filter out zeros, only to find your code not doing what it was supposed to do?
There’s nothing wrong with your code (sort of), the problem lies in how the number is rounded up to take less space on your screen. Take this for example:
y although very close, aren’t identical, and testing for equality will result in
But in most cases, the number as low as this can be considered as 0. The question remains, how to put this in code?
Luckily for us, Numpy has an
isclose() function to help us out here. It will test for equality of two numbers within a tolerance, which is
1e-8 by default. Let’ see this in action:
Great, let’s continue.
The main idea of this function is to take two arrays as inputs and return the elements contained in both arrays. Think of it as a set intersect, but you know, without converting arrays to sets and calculating the intersect.
For demonstration let’s declare two arrays:
Now to see which elements are present in both we need to do something like this:
Great! An alternative method, the one mentioned before would be a bit longer to write and would include more brackets:
Feel free two use either, but I prefer the first one.
Now this one I use very often when preparing the dataset for predictive modeling — in a process of categorical variable embedding to be more precise.
Let’s say you’ve embedded some variables and want to put them in a matrix form, where each variable is a single column — the only logical thing to do is to stack the variables along the column axis.
But I’ve gone a bit ahead of myself.
For demonstration purposes we’ll be dealing with the same arrays from before:
Great. Now let’s use the
stack() function to stack the arrays along the row axis:
Keep in mind that
axis=0 is optional here, because it’s a default axis to perform stacking on, but I’ve specified it to be more explicit.
The more common example in practice is to stack along the column axis (at least at my job):
Before you go
I hope you can somehow utilize these three functions in your daily routine as a data analyst/scientist — they are a good time saver.
As always, if you have some additional functions you’re using on a daily basis, don’t hesitate to share it in the comment section.
Thanks for reading.