• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

The Programming Expert

Solving All of Your Programming Headaches

  • HTML
  • JavaScript
  • jQuery
  • PHP
  • Python
  • SAS
  • VBA
  • About
You are here: Home / Python / pandas interpolate() – Fill NaN Values with Interpolation in DataFrame

pandas interpolate() – Fill NaN Values with Interpolation in DataFrame

May 2, 2022 Leave a Comment

When working with data in pandas, you can fill NaN values with interpolation using the pandas interpolate() function.

df_withinterpolation = df["col_with_nan"].interpolate(method="linear")

There are many different interpolation methods you can use. In this post, you’ll learn how to use interpolate() to fill NaN Values with pandas in Python.


When working with data, NaN values can be a problem for us, and depending on the situation, we might want to remove those NaN values or fill the NaN values.

One way you can deal with NaN values is with interpolation. If you are working with time series data, interpolation allows us to fill missing values and create new data points.

When using pandas, the interpolate() function allows us to fill NaN values with different interpolation methods.

By default, interpolate() using linear interpolation to interpolate between two non-NaN values to fill a NaN value.

Let’s say we have the following data with some NaN values.

                 time  value
2022-05-01 00:00:00    1.0
2022-05-01 06:00:00    NaN
2022-05-01 12:00:00    7.0
2022-05-01 18:00:00    NaN
2022-05-02 00:00:00    9.0
2022-05-02 06:00:00    NaN
2022-05-02 12:00:00    8.0
2022-05-02 18:00:00    NaN
2022-05-03 00:00:00    9.0
2022-05-03 06:00:00    NaN

Below is an example of how to use interpolate() to perform linear interpolation and fill NaN values with the midpoint between two values.

print(df.interpolate(method="linear"))

#Output:
                     value
time
2022-05-01 00:00:00    1.0
2022-05-01 06:00:00    4.0
2022-05-01 12:00:00    7.0
2022-05-01 18:00:00    8.0
2022-05-02 00:00:00    9.0
2022-05-02 06:00:00    8.5
2022-05-02 12:00:00    8.0
2022-05-02 18:00:00    8.5
2022-05-03 00:00:00    9.0
2022-05-03 06:00:00    4.5

As you can see, the NaN values have been filled using linear interpolation.

There are many different interpolation methods (such as cubic, spline, polynomial, etc.) you can use for interpolation which can you read about in the documentation. Some of these methods may require the SciPy module.

Interpolating Data After Resampling with pandas interpolate() Function

One common use of the pandas interpolate() function is after resampling. The pandas resample() function allows us to resample time series data.

One way we can use resample() is to increase the frequency of our time series data. To increasing the frequency of our time series data is called upsampling. This is like taking monthly data and making it daily.

Let’s say we have the following data which has data points every 12 hours.

import pandas as pd
import numpy as np

df = pd.DataFrame({'time':pd.date_range(start='05-01-2022',end='05-31-2022', freq="12H"), 'value':np.random.randint(10,size=61)})

print(df.head(10))

#Output:
                 time  value
0 2022-05-01 00:00:00      5
1 2022-05-01 12:00:00      1
2 2022-05-02 00:00:00      9
3 2022-05-02 12:00:00      8
4 2022-05-03 00:00:00      9
5 2022-05-03 12:00:00      7
6 2022-05-04 00:00:00      7
7 2022-05-04 12:00:00      4
8 2022-05-05 00:00:00      6
9 2022-05-05 12:00:00      4

Let’s increase the frequency of our data to every 3 hours with resample(). First, we need to set the date time column as the index, and then we can resample.

Then, we can increase the frequency of our data by passing “3H” to resample().

df.set_index('time', inplace=True)

resampled_df = df.resample("3H").mean()

print(resampled_df.head(10))

#Output:
                     value
time
2022-05-01 00:00:00    5.0
2022-05-01 03:00:00    NaN
2022-05-01 06:00:00    NaN
2022-05-01 09:00:00    NaN
2022-05-01 12:00:00    1.0
2022-05-01 15:00:00    NaN
2022-05-01 18:00:00    NaN
2022-05-01 21:00:00    NaN
2022-05-02 00:00:00    9.0
2022-05-02 03:00:00    NaN

As you can see, we’ve now added datapoints between the datapoints which previously existed, but the values for these datapoints are NaN.

To fill these NaN values, you can use interpolate(). Below is an example of how to use a polynomial of order 2 for interpolation to fill the NaN values in the time series data.

resampled_df = df.resample("3H").interpolate(method="polynomial", order=2)

print(resampled_df.head(10))

#Output:
                        value
time
2022-05-01 00:00:00  5.000000
2022-05-01 03:00:00  2.503992
2022-05-01 06:00:00  1.005323
2022-05-01 09:00:00  0.503992
2022-05-01 12:00:00  1.000000
2022-05-01 15:00:00  2.493346
2022-05-01 18:00:00  4.984031
2022-05-01 21:00:00  7.482700
2022-05-02 00:00:00  9.000000
2022-05-02 03:00:00  9.535930

Hopefully this article has been useful for you to learn about the pandas interpolate() function and how you can interpolate between datapoints and fill NaN values in your Python code.

Other Articles You'll Also Like:

  • 1.  How to Check if String Contains Uppercase Letters in Python
  • 2.  math.degrees() Python – How to Convert Radians to Degrees in Python
  • 3.  Python Add Days to Date Using datetime timedelta() Function
  • 4.  Sign Function in Python – Get Sign of Number
  • 5.  Euclidean Algorithm and Extended Euclidean Algorithm in Python
  • 6.  pandas Absolute Value – Get Absolute Values in a Series or DataFrame
  • 7.  pi in Python – Using Math Module and Leibniz Formula to Get Value of pi
  • 8.  Squaring in Python – Square a Number Using Python math.pow() Function
  • 9.  PROC MEANS Equivalent in Python
  • 10.  pandas mean – Get Average of Series or DataFrame Columns

About The Programming Expert

The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.

Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.

At the end of the day, we want to be able to just push a button and let the code do it’s magic.

You can read more about us on our about page.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

About The Programming Expert

the programming expert main image

The Programming Expert is a compilation of hundreds of code snippets to help you find solutions to your problems in Python, JavaScript, PHP, HTML, SAS, and more.

Search

Learn Coding from Experts on Udemy

Looking to boost your skills and learn how to become a programming expert?

Check out the links below to view Udemy courses for learning to program in the following languages:

Copyright © 2022 · The Programming Expert · About · Privacy Policy