• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

The Programming Expert

Solving All of Your Programming Headaches

  • HTML
  • JavaScript
  • jQuery
  • PHP
  • Python
  • SAS
  • Ruby
  • About
You are here: Home / Python / pandas Duplicated – Find Duplicate Rows in DataFrame or Series

pandas Duplicated – Find Duplicate Rows in DataFrame or Series

January 13, 2022 Leave a Comment

To find duplicate rows in a DataFrame or Series in pandas, the easiest way is to use the pandas duplicated() function.

df.duplicated()

When working with data, it’s important to be able to find any problems with our data. Finding duplicate records in our data is one such situation where we may need to take additional steps to fix our data.

With Python, we can find duplicate rows in data very easily using the pandas package and the pandas duplicated() function.

Let’s say we have the following DataFrame:

df = pd.DataFrame({'Name': ['Jim','Jim','Jim','Sally','Bob','Sue','Sue','Larry'],
                   'Weight':['100','100','200','100','200','150','150','200']})


# Output:
    Name Weight
0    Jim    100
1    Jim    100
2    Jim    200
3  Sally    100
4    Bob    200
5    Sue    150
6    Sue    150
7  Larry    200

Let’s find the duplicate rows in this DataFrame. We can do this easily using the pandas duplicated() function. The duplicated() function returns a Series with boolean values denoting where we have duplicate rows. By default, it marks all duplicates as True except the first occurrence.

print(df.duplicated())

# Output:
0    False
1     True
2    False
3    False
4    False
5    False
6     True
7    False
dtype: bool

To mark the first occurrence of the duplicates as True, we can pass “keep=’last'” to the duplicated() function.

print(df.duplicated(keep='last'))

# Output:
0     True
1    False
2    False
3    False
4    False
5     True
6    False
7    False
dtype: bool

To mark all duplicates as True, pass ‘keep=False’ to the duplicated() function.

print(df.duplicated(keep=False))

# Output:
0     True
1     True
2    False
3    False
4    False
5     True
6     True
7    False
dtype: bool

Depending on the way you want to handle these duplicates, you may want to keep or remove the duplicate rows.

Finding Duplicate Rows based on Column Using Pandas

By default, the duplicated function finds duplicates based on all columns of a DataFrame. We can find duplicate rows based on just one column or multiple columns using the “subset” parameter.

Let’s say we have the same DataFrame as above. We can find all of the duplicates based on the “Name” column by passing ‘subset=[“Name”]’ to the duplicated() function.

print(df.duplicated(subset=["Name"]))

#Output: 
0    False
1     True
2     True
3    False
4    False
5    False
6     True
7    False
dtype: bool

Hopefully this article has been beneficial for you to understand how to use the pandas duplicated() function to find duplicate rows in your data analysis in Python.

Other Articles You'll Also Like:

  • 1.  Difference Between read(), readline() and readlines() in Python
  • 2.  How to Group By Columns and Find Sum in pandas DataFrame
  • 3.  How to Shutdown Computer with Python
  • 4.  Remove None From List Using Python
  • 5.  Python Subtract Days from Date Using datetime timedelta() Function
  • 6.  Get Last Day of Month Using Python
  • 7.  Replace Forwardslashes in String Using Python
  • 8.  Sorting with Lambda Functions in Python
  • 9.  How to Create Array from 1 to n in Python
  • 10.  Python asinh – Find Hyperbolic Arcsine of Number Using math.asinh()

About The Programming Expert

The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.

Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.

At the end of the day, we want to be able to just push a button and let the code do it’s magic.

You can read more about us on our about page.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

About The Programming Expert

the programming expert main image

Welcome to The Programming Expert. We are a group of US-based programming professionals who have helped companies build, maintain, and improve everything from simple websites to large-scale projects.

We built The Programming Expert to help you solve your programming problems with useful coding methods and functions in various programming languages.

Search

Learn Coding from Experts on Udemy

Looking to boost your skills and learn how to become a programming expert?

Check out the links below to view Udemy courses for learning to program in the following languages:

Copyright © 2023 · The Programming Expert · About · Privacy Policy

x