• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

The Programming Expert

Solving All of Your Programming Headaches

  • HTML
  • JavaScript
  • jQuery
  • PHP
  • Python
  • SAS
  • Ruby
  • About
You are here: Home / Python / Using pandas sample() to Generate a Random Sample of a DataFrame

Using pandas sample() to Generate a Random Sample of a DataFrame

May 2, 2022 Leave a Comment

To sample a DataFrame with pandas in Python, you can use the sample() function. Pass the number of elements you want to extract or a fraction of items to return.

sampled_df = df.sample(n=100)
sampled_df = df.sample(frac=0.5)

In this article, you’ll learn how to get a random sample of data in Python with the pandas sample() function.


When working with data in Python, many times we want to get a random sample of our data. For example, in modeling, we might take a random sample to prevent overfitting a model or to create fitting and validation datasets.

With pandas, we can easily get random samples of data with the pandas sample() function.

You can use sample() to get a sample of a specific number of records, get a sample of a fraction of records, get a sample of the columns of a DataFrame, and sample with replacement.

Let’s say we have the following DataFrame in Python.

df = pd.DataFrame({'Name': ['Jim','Jim','Jim','Sally','Bob','Sue','Sue','Larry'],
                   'Weight':['100','100','200','100','200','150','150','200']})


# Output:
    Name Weight
0    Jim    100
1    Jim    100
2    Jim    200
3  Sally    100
4    Bob    200
5    Sue    150
6    Sue    150
7  Larry    200

If you want to generate a 50% sample of this dataset, you can pass “0.5” to the “frac” parameter.

print(df.sample(frac=0.5))

# Output:
    Name Weight
0    Jim    100
1    Jim    100
4    Bob    200
7  Larry    200

If instead, you wanted to extract 4 items from the data randomly, you can pass “4” to the “n” parameter.

print(df.sample(n=4))

# Output:
    Name Weight
0    Jim    100
1    Jim    100
5    Sue    150
6    Sue    150

You can also return a sample which has more records than the original dataset. If you want to create a 200% sample of your data, you can pass “2” to the “frac” parameter.

print(df.sample(frac=2))

# Output:
    Name Weight
0    Jim    100
1    Jim    100
4    Bob    200
7  Larry    200

Like most pandas functions, sample() has the parameter “inplace” which allows you to modify a given DataFrame in place, and you can also sample columns by passing “1” to the parameter “axis”.

Using Seed for the Random Number Generation with sample()

When creating a random sample, many times we want reproducibility. For example, if I’m validating someone else’s results, then I want to be able to reproduce every dataset in their process.

The “random_state” parameter of the sample() function allows us to pass a “seed” for the random number generator of sample().

Below shows an example of how you can use the “random_state” parameter in sample().

sampled_df = df.sample(frac=0.5, random_state=5)

Random Sampling with Replacement in pandas

If you want get a random sample with replacement, you can also do that with the pandas sample() function.

The “replace” parameter allows you to perform sampling with replacement.

Sampling with replacement means that after each element is chosen via the sampling algorithm, instead of removing that element, it is put back into the population.

Below shows an example of how you can use the “replace” parameter to get a random sample with replacement with the pandas sample() function.

sampled_df = df.sample(frac=0.5, replace=5)

Hopefully this article has been useful for you to learn how to use the pandas sample() function to generate random samples of your data in Python.

Other Articles You'll Also Like:

  • 1.  Sum Columns Dynamically with pandas in Python
  • 2.  Python power function – Exponentiate Numbers with math.pow()
  • 3.  Get First Digit in Number Using Python
  • 4.  Read Last Line of File Using Python
  • 5.  How to Subtract Two Numbers in Python
  • 6.  Using Python to Read File Character by Character
  • 7.  Python isprime – Function for Determining if Number is Prime
  • 8.  Using Python to Create Empty DataFrame with pandas
  • 9.  pandas str replace – Replace Text in Dataframe with Regex Patterns
  • 10.  Python Remove First Element from List

About The Programming Expert

The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.

Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.

At the end of the day, we want to be able to just push a button and let the code do it’s magic.

You can read more about us on our about page.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

About The Programming Expert

the programming expert main image

Welcome to The Programming Expert. We are a group of US-based programming professionals who have helped companies build, maintain, and improve everything from simple websites to large-scale projects.

We built The Programming Expert to help you solve your programming problems with useful coding methods and functions in various programming languages.

Search

Learn Coding from Experts on Udemy

Looking to boost your skills and learn how to become a programming expert?

Check out the links below to view Udemy courses for learning to program in the following languages:

Copyright © 2023 · The Programming Expert · About · Privacy Policy