• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

The Programming Expert

Solving All of Your Programming Headaches

  • HTML
  • JavaScript
  • jQuery
  • PHP
  • Python
  • SAS
  • Ruby
  • About
You are here: Home / Python / pandas dropna – Drop Rows or Columns with NaN in DataFrame

pandas dropna – Drop Rows or Columns with NaN in DataFrame

January 13, 2022 Leave a Comment

To drop rows or columns with missing values in a DataFrame and using pandas, the simplest way is to use the pandas dropna() function.

df = df.dropna() #drops rows with missing values

df["Column 1"] = df["Column 1"].dropna() #drops rows with missing values in column "Column 1"

df = df.dropna(axis=1) #drop columns with missing values

When working with data, missing values can make life as an analyst difficult. Depending on the task at hand, you may want to replace missing values with another value, or drop rows, or columns, which contain missing values.

If you want to drop rows or columns with missing values, we can use the pandas dropna() function.

Let’s say I have the following DataFrame of summarized data:

   animal_type  gender         type variable level  count    sum   mean        std   min    25%   50%    75%    max
0          cat  female      numeric      age   N/A    5.0   18.0   3.60   1.516575   2.0   3.00   3.0   4.00    6.0
1          cat    male      numeric      age   N/A    2.0    3.0   1.50   0.707107   1.0   1.25   1.5   1.75    2.0
2          dog  female      numeric      age   N/A    2.0    8.0   4.00   0.000000   4.0   4.00   4.0   4.00    4.0
3          dog    male      numeric      age   N/A    4.0   15.0   3.75   1.892969   1.0   3.25   4.5   5.00    5.0
4          cat  female      numeric   weight   N/A    5.0  270.0  54.00  32.093613  10.0  40.00  50.0  80.00   90.0
5          cat    male      numeric   weight   N/A    2.0  110.0  55.00  63.639610  10.0  32.50  55.0  77.50  100.0
6          dog  female      numeric   weight   N/A    2.0  100.0  50.00  42.426407  20.0  35.00  50.0  65.00   80.0
7          dog    male      numeric   weight   N/A    4.0  180.0  45.00  23.804761  20.0  27.50  45.0  62.50   70.0
8          cat  female  categorical    state    FL    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
9          cat  female  categorical    state    NY    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
10         cat  female  categorical    state    TX    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
11         cat    male  categorical    state    CA    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
12         cat    male  categorical    state    TX    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
13         dog  female  categorical    state    FL    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
14         dog  female  categorical    state    TX    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
15         dog    male  categorical    state    CA    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
16         dog    male  categorical    state    FL    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
17         dog    male  categorical    state    NY    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
18         cat  female  categorical  trained   yes    5.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
19         cat    male  categorical  trained    no    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
20         dog  female  categorical  trained    no    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
21         dog  female  categorical  trained   yes    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
22         dog    male  categorical  trained    no    4.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN   NaN

In this dataframe, we have a lot of NaN values.

To drop rows or columns with NaN values, we can use the pandas

dropna() function to accomplish this.

Let’s say that we want to drop all of the rows which contain at least 1 NaN value. The following code will remove all rows with NaN values from our DataFrame.

df.dropna()

#output:
   animal_type  gender         type variable level  count   mean    sum        std   min    25%   50%    75%    max
0          cat  female      numeric      age   N/A    5.0   3.60   18.0   1.516575   2.0   3.00   3.0   4.00    6.0
1          cat    male      numeric      age   N/A    2.0   1.50    3.0   0.707107   1.0   1.25   1.5   1.75    2.0
2          dog  female      numeric      age   N/A    2.0   4.00    8.0   0.000000   4.0   4.00   4.0   4.00    4.0
3          dog    male      numeric      age   N/A    4.0   3.75   15.0   1.892969   1.0   3.25   4.5   5.00    5.0
4          cat  female      numeric   weight   N/A    5.0  54.00  270.0  32.093613  10.0  40.00  50.0  80.00   90.0
5          cat    male      numeric   weight   N/A    2.0  55.00  110.0  63.639610  10.0  32.50  55.0  77.50  100.0
6          dog  female      numeric   weight   N/A    2.0  50.00  100.0  42.426407  20.0  35.00  50.0  65.00   80.0
7          dog    male      numeric   weight   N/A    4.0  45.00  180.0  23.804761  20.0  27.50  45.0  62.50   70.0

If we want to drop all of the columns which contain at least 1 NaN value, we can pass ‘axis=1’ to dropna().

df.dropna(axis=1)

   animal_type  gender         type variable level  count
0          cat  female      numeric      age   N/A    5.0
1          cat    male      numeric      age   N/A    2.0
2          dog  female      numeric      age   N/A    2.0
3          dog    male      numeric      age   N/A    4.0
4          cat  female      numeric   weight   N/A    5.0
5          cat    male      numeric   weight   N/A    2.0
6          dog  female      numeric   weight   N/A    2.0
7          dog    male      numeric   weight   N/A    4.0
8          cat  female  categorical    state    FL    2.0
9          cat  female  categorical    state    NY    1.0
10         cat  female  categorical    state    TX    2.0
11         cat    male  categorical    state    CA    1.0
12         cat    male  categorical    state    TX    1.0
13         dog  female  categorical    state    FL    1.0
14         dog  female  categorical    state    TX    1.0
15         dog    male  categorical    state    CA    1.0
16         dog    male  categorical    state    FL    1.0
17         dog    male  categorical    state    NY    2.0
18         cat  female  categorical  trained   yes    5.0
19         cat    male  categorical  trained    no    2.0
20         dog  female  categorical  trained    no    1.0
21         dog  female  categorical  trained   yes    1.0
22         dog    male  categorical  trained    no    4.0

Dropping Rows and Columns with Pandas dropna() Function

The pandas dropna() function has different parameters which you can pass which will affect which rows or columns with missing values are dropped.

For example, we can pass different values to the “how” parameter to determine which rows or columns are dropped based on the number of NaN values in that column.

The default dropping behavior for dropna() is to drop if there is at least 1 NaN value in the column, but if we pass ‘how=”all”‘ to dropna(), then all values in that row or column must be NaN.

Let’s say we have a different DataFrame from above:

df = pd.DataFrame({'Name': ['Jim','Sally','Paul','Nancy',np.NaN], 
          'Height':[np.NaN,np.NaN,np.NaN,np.NaN, np.NaN], 
          'Weight': [100,120,340,230,np.NaN]})

# Output:
    Name  Height  Weight
0    Jim     NaN   100.0
1  Sally     NaN   120.0
2   Paul     NaN   340.0
3  Nancy     NaN   230.0
4    NaN     NaN     NaN

Let’s see how the ‘how’ parameter can affect what is dropped when working with this DataFrame.

By default, if we call dropna() without passing any other parameters, we will drop all rows with at least 1 NaN value. In this case, the return DataFrame will be empty.

If we call dropna() with the ‘how=”all”‘ parameter, we will only drop rows with all NaN values – i.e. the index 4 row.

print(df.dropna())
print(df.dropna(how='all'))

# Output:

Empty DataFrame
Columns: [Name, Height, Weight]
Index: []

    Name  Height  Weight
0    Jim     NaN   100.0
1  Sally     NaN   120.0
2   Paul     NaN   340.0
3  Nancy     NaN   230.0

If we call dropna() to remove columns with NaN and see how the parameter ‘how’ works in this case, we can pass ‘axis=1’ as well.

print(df.dropna(axis=1))
print(df.dropna(axis=1,how='all'))

# Output:

Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]

    Name  Weight
0    Jim   100.0
1  Sally   120.0
2   Paul   340.0
3  Nancy   230.0
4    NaN     NaN

You can also pass a value to the “thresh” parameter which sets the number of missing values which are required to drop the row or column.

If we pass ‘thresh=2’ to dropna() in our example, only the last row is dropped.

print(df.dropna(thresh=2))

# Output:
    Name  Height  Weight
0    Jim     NaN   100.0
1  Sally     NaN   120.0
2   Paul     NaN   340.0
3  Nancy     NaN   230.0

Dropping Rows and Columns Based on Subset with dropna() in pandas

The last feature to talk about here with the dropna() function is the ‘subset’ parameter.

We can drop rows and columns based on the missing values of just a single or multiple rows and columns if we want.

Let’s say we have the same DataFrame from above.

We can pass ‘subset=[“Name”]’ to only drop the rows which have a missing value in the “Name” column.

print(df.dropna(subset=["Name"]))

# Output:
    Name  Height  Weight
0    Jim     NaN   100.0
1  Sally     NaN   120.0
2   Paul     NaN   340.0
3  Nancy     NaN   230.0

As you can see, only the last row is dropped in this case.

Hopefully this article has helped you learn how to drop rows and columns with NaN values using the pandas dropna() function in Python.

Other Articles You'll Also Like:

  • 1.  Truncate String in Python with String Slicing
  • 2.  Add Tuple to List in Python
  • 3.  How to Check if Number is Negative in Python
  • 4.  Changing Python Turtle Speed with speed() Function
  • 5.  Get Substring Between Two Characters with Python
  • 6.  Convert pandas Series to Dictionary in Python
  • 7.  Convert First Letter of String to Lowercase in Python
  • 8.  Get Month Name from Date in Python
  • 9.  Python sin – Find Sine of Number in Radians Using math.sin()
  • 10.  Time Difference in Seconds Between Datetimes in Python

About The Programming Expert

The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.

Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.

At the end of the day, we want to be able to just push a button and let the code do it’s magic.

You can read more about us on our about page.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

About The Programming Expert

the programming expert main image

Welcome to The Programming Expert. We are a group of US-based programming professionals who have helped companies build, maintain, and improve everything from simple websites to large-scale projects.

We built The Programming Expert to help you solve your programming problems with useful coding methods and functions in various programming languages.

Search

Learn Coding from Experts on Udemy

Looking to boost your skills and learn how to become a programming expert?

Check out the links below to view Udemy courses for learning to program in the following languages:

Copyright © 2023 · The Programming Expert · About · Privacy Policy