When working with data in pandas, you can fill NaN values with interpolation using the pandas **interpolate()** function.

`df_withinterpolation = df["col_with_nan"].interpolate(method="linear")`

There are many different interpolation methods you can use. In this post, you’ll learn how to use **interpolate()** to fill NaN Values with pandas in Python.

When working with data, NaN values can be a problem for us, and depending on the situation, we might want to remove those NaN values or fill the NaN values.

One way you can deal with NaN values is with interpolation. If you are working with time series data, interpolation allows us to fill missing values and create new data points.

When using pandas, the **interpolate()** function allows us to fill NaN values with different interpolation methods.

By default, **interpolate()** using linear interpolation to interpolate between two non-NaN values to fill a NaN value.

Let’s say we have the following data with some NaN values.

```
time value
2022-05-01 00:00:00 1.0
2022-05-01 06:00:00 NaN
2022-05-01 12:00:00 7.0
2022-05-01 18:00:00 NaN
2022-05-02 00:00:00 9.0
2022-05-02 06:00:00 NaN
2022-05-02 12:00:00 8.0
2022-05-02 18:00:00 NaN
2022-05-03 00:00:00 9.0
2022-05-03 06:00:00 NaN
```

Below is an example of how to use **interpolate()** to perform linear interpolation and fill NaN values with the midpoint between two values.

```
print(df.interpolate(method="linear"))
#Output:
value
time
2022-05-01 00:00:00 1.0
2022-05-01 06:00:00 4.0
2022-05-01 12:00:00 7.0
2022-05-01 18:00:00 8.0
2022-05-02 00:00:00 9.0
2022-05-02 06:00:00 8.5
2022-05-02 12:00:00 8.0
2022-05-02 18:00:00 8.5
2022-05-03 00:00:00 9.0
2022-05-03 06:00:00 4.5
```

As you can see, the NaN values have been filled using linear interpolation.

There are many different interpolation methods (such as cubic, spline, polynomial, etc.) you can use for interpolation which can you read about in the documentation. Some of these methods may require the SciPy module.

## Interpolating Data After Resampling with pandas interpolate() Function

One common use of the pandas **interpolate()** function is after resampling. The pandas resample() function allows us to resample time series data.

One way we can use **resample()** is to increase the frequency of our time series data. To increasing the frequency of our time series data is called upsampling. This is like taking monthly data and making it daily.

Let’s say we have the following data which has data points every 12 hours.

```
import pandas as pd
import numpy as np
df = pd.DataFrame({'time':pd.date_range(start='05-01-2022',end='05-31-2022', freq="12H"), 'value':np.random.randint(10,size=61)})
print(df.head(10))
#Output:
time value
0 2022-05-01 00:00:00 5
1 2022-05-01 12:00:00 1
2 2022-05-02 00:00:00 9
3 2022-05-02 12:00:00 8
4 2022-05-03 00:00:00 9
5 2022-05-03 12:00:00 7
6 2022-05-04 00:00:00 7
7 2022-05-04 12:00:00 4
8 2022-05-05 00:00:00 6
9 2022-05-05 12:00:00 4
```

Let’s increase the frequency of our data to every 3 hours with **resample()**. First, we need to set the date time column as the index, and then we can resample.

Then, we can increase the frequency of our data by passing “3H” to **resample()**.

```
df.set_index('time', inplace=True)
resampled_df = df.resample("3H").mean()
print(resampled_df.head(10))
#Output:
value
time
2022-05-01 00:00:00 5.0
2022-05-01 03:00:00 NaN
2022-05-01 06:00:00 NaN
2022-05-01 09:00:00 NaN
2022-05-01 12:00:00 1.0
2022-05-01 15:00:00 NaN
2022-05-01 18:00:00 NaN
2022-05-01 21:00:00 NaN
2022-05-02 00:00:00 9.0
2022-05-02 03:00:00 NaN
```

As you can see, we’ve now added datapoints between the datapoints which previously existed, but the values for these datapoints are NaN.

To fill these NaN values, you can use **interpolate()**. Below is an example of how to use a polynomial of order 2 for interpolation to fill the NaN values in the time series data.

```
resampled_df = df.resample("3H").interpolate(method="polynomial", order=2)
print(resampled_df.head(10))
#Output:
value
time
2022-05-01 00:00:00 5.000000
2022-05-01 03:00:00 2.503992
2022-05-01 06:00:00 1.005323
2022-05-01 09:00:00 0.503992
2022-05-01 12:00:00 1.000000
2022-05-01 15:00:00 2.493346
2022-05-01 18:00:00 4.984031
2022-05-01 21:00:00 7.482700
2022-05-02 00:00:00 9.000000
2022-05-02 03:00:00 9.535930
```

Hopefully this article has been useful for you to learn about the pandas **interpolate()** function and how you can interpolate between datapoints and fill NaN values in your Python code.

## Leave a Reply