Data PreparationPython

Mastering Date and Time Manipulations with Python and Pandas

This article was originally published on Towards Data Science on March 4th, 2020.

There’s an old saying going something like this: No matter if a developer is Junior or Senior, he will always reference the documentation when working with dates. And I agree, it’s definitely a case with my job. There’s a lot of things I can implement with none or minimal documentation referencing, but where I suck is the date and time manipulation, or managing to remember the formatting codes.

That’s the reason why I decided to write an article on the topic — I don’t feel like I’m alone in this situation.

Working with date and time is not difficult, but there are certainly many things to remember. Today we’ll cover most of those, so you’ll be ready for your next analysis challenge. Everything we do will be implemented either in Python’s default library for handling date and time — conveniently called datetime— or in Pandas instead.

Here are the imports:

Without much ado, let’s jump right in.

DateTime Basics

With the datetime library we can easily create datetime objects — here’s an example of how to do so if you don’t care about an hour, minute, and second information:

Or if you do care about those you can specify values easily:

Maybe you want to know what’s the time now, down to the millisecond — you can use the convenient .now() function:

Time Delta

Let’s do something different now. I’m gonna declare a variable and set it’s value to the current time:

Sometime later I will declare another variable, and set it’s value to the current time minus the value of current_time variable:

You can see how we have this timedelta object returned, indicating how much time passed between the declaration of those two variables. You can easily access that information:

This can be useful if you want to compute the difference between two date columns in a dataset — for example, if one column indicates when some process started and the other one indicates when that process ended.

String to Time and Vice Versa

The datetime library has two more convenient methods for converting string records to datetime objects, or datetime objects to strings.

Let’s explore the first option. You have time information represented in a string, and want to convert it to a datetime object for faster manipulations later on (can’t really compare strings in this case). In this case, you’ll want to use strptime() function:

If you’re confused about the formatting, check out this site, you’ll find all the codes there.

This process can be reversed. Let’s imagine you have date information represented as datetime, and you need to present the values to the user somehow, let’s say through a front-end of some sort. The strftime()method comes to the rescue:

Now when we have those tools under our belt, let’s explore what the Pandas library has to offer.

Date Ranges in Pandas

Most of the time when dealing with dates I’m using Python’s datetimelibrary. The only exceptions would be in case of:

  • Date ranges
  • Time resampling

The second one would require an entire article to explore in-depth, so I’ll focus on the first one.

I’m using Pandas to create date ranges just because I want to avoid loops with the standard library. If you’re unsure of what I think when I say “date ranges”, here’s a short example. The idea is to create a 30 day period starting on the first of January 2020:

Do you see how easy that was?

Keep in mind that you are not required to stick to some particular formatting, as Pandas is smart enough to infer it:

This one will also work just fine:

Even some crazy combination won’t be a problem:

Before you go

And that’s just about enough of date and time declaration and manipulation in Python. There are more advanced things you can do, but I find this more enough for most of the tasks.

If you’ve liked this article stay tuned, in a couple of days I’ll be publishing an article on date resampling and advanced data filtering based on time conditions.

Thanks for reading.

Dario Radečić
Data scientist, blogger, and enthusiast. Passionate about deep learning, computer vision, and data-driven decision making.

You may also like

Comments are closed.