• Skip to primary navigation
  • Skip to main content

The Programming Expert

Solving All of Your Programming Headaches

  • Home
  • Learn to Code
    • Python
    • JavaScript
  • Code Snippets
    • HTML
    • JavaScript
    • jQuery
    • PHP
    • Python
    • SAS
    • Ruby
  • About
  • Write for Us
You are here: Home / Python / How to Convert pandas Column dtype from Object to Category

How to Convert pandas Column dtype from Object to Category

December 1, 2022 Leave a Comment

To convert a column in a pandas DataFrame from a column with data type “object” to a column with data type “category”, use the astype() function.

import pandas as pd

df = pd.DataFrame({ "column": ["a","b","c","a","b","c","b","d"] }) 
 
print(df["column"].dtype)

df["column"] = df["column"].astype('category')

print(df["column"].dtype)

#Output:
object
category

When working with different types of data in pandas, the ability to easily be able to change the data type of a column is valuable.

One such case is if you want to convert a pandas column from a column with the data type “object” to a column with data type “category”.

To convert a column in a pandas DataFrame from a column with data type “object” to a column with data type “category”, use the astype() function.

astype() allows you to convert the data type of pandas columns.

Below is a simple example showing you how to convert the data type of a pandas column from “object” to “category”.

import pandas as pd

df = pd.DataFrame({ "column": ["a","b","c","a","b","c","b","d"] }) 
 
print(df["column"].dtype)

df["column"] = df["column"].astype('category')

print(df["column"].dtype)

#Output:
object
category

Reducing Memory Usage with dtype Category Columns in pandas

One of the main benefits of using “category” columns in pandas is you are able to reduce the amount of memory used in your process.

The reason for this is that categorical data is pandas stores only the unique values (i.e the category) instead of every single value.

Below shows an example of how you can reduce memory using categorical data in pandas.

import pandas as pd

s = pd.Series(["a","b","c","a","b","c","b","d"] * 1000) 
 
print(s.nbytes)
print(s.astype("category").nbytes)

#Output:
64000
8032

Using groupby() When Working With Column with dtype Category in pandas

One last thing I want to add to this post is something that I came across when I was performing some data analysis with pandas.

If you have categorical data and go to use the groupby() function to group your DataFrame, you should use the “observed=True” option so that groupby() behaves the same as it does when you use it on data which has the data type “object”.

Below shows you an example of how using the “observed=True” option in groupby() affects the output if you are using groupby() in pandas.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","M","F","M","M","F","M","M","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

df["animal_type"] = df["animal_type"].astype('category')
df["gender"] = df["gender"].astype('category')

print(df.groupby(["animal_type","gender"])["age"].max())
print(df.groupby(["animal_type","gender"], observed=True)["age"].max())

#Output:
animal_type  gender
cat          F         NaN
             M         8.0
dog          F         6.0
             M         9.0

animal_type  gender
dog          F         6
             M         9
cat          M         8
Name: age, dtype: int64

Hopefully this article has been useful for you to learn how to convert a pandas column from object to category in Python.

Other Articles You'll Also Like:

  • 1.  Using Python to Print Degree Symbol
  • 2.  How to Write Excel File to AWS S3 Bucket Using Python
  • 3.  Using Python turtle Module to Draw Square
  • 4.  Remove Extension from Filename in Python
  • 5.  Check if All Elements in Array are Equal in Python
  • 6.  nunique pandas – Get Number of Unique Values in DataFrame
  • 7.  Get Last Character in String in Python
  • 8.  How to Combine Dictionaries in Python
  • 9.  How to Read XLSX File from Remote Server Using Paramiko FTP and Pandas
  • 10.  Mastering Data Selection with Pandas iloc

About The Programming Expert

The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.

Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.

At the end of the day, we want to be able to just push a button and let the code do it’s magic.

You can read more about us on our about page.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Copyright © 2023 · The Programming Expert · About · Privacy Policy