To find the standard deviation of a series or a column in a DataFrame in pandas, the easiest way is to use the pandas **std()** function.

`df["Column1"].std() `

You can also use the numpy **std()** function, but be careful as the default algorithm is different than the default pandas **std()** algorithm.

```
np.std(df["Column1"]) #Different result from default pandas function
np.std(df["Column1"],ddof=1) #Same result as default pandas function
```

When doing data analysis, the ability to compute different summary statistics, such as the mean or median of a variable, is very useful to help us understand the data. One such summary statistic which can be useful is the standard deviation of a variable.

Finding the standard deviation of columns or a Series using pandas is easy. We can use the pandas **std()** function to find the standard deviation of a column of numbers.

Let’s say we have the following DataFrame.

```
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [160.20, 160.20, 209.45, 150.35, 187.52, 187.52],
'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42] })
print(df)
# Output:
Name Weight Height
0 Jim 160.20 50.10
1 Sally 160.20 68.94
2 Bob 209.45 71.42
3 Sue 150.35 48.56
4 Jill 187.52 59.37
5 Larry 187.52 63.42
```

To get the standard deviation of the column “Height”, we can use the pandas **std()** function in the following Python code:

```
print(df["Height"].std())
# Output:
9.49495532726019
```

## Calculating the Standard Deviation of a Series with numpy

We can also find the standard deviation of a series using the numpy **std()** function. Depending on the complexity of our code, it might be faster to use the numpy **std()** function.

Let’s say we have the same dataset as above.

To get the standard deviation of the column “Height”, we can use the numpy **std()** function in the following Python code.

```
print(np.std(df["Height"]))
# Output:
8.667668692073754
```

As you can verify for yourself, this is a different result from the pandas **std()** function. The reason for this is the default normalization method is different between pandas and numpy.

To get the same standard deviation using both numpy and pandas, you need to pass ‘ddof=1’ to the numpy **std()** function.

```
print(np.std(df["Height"]))
print(np.std(df["Height"],ddof=1))
print(df["Height"].std())
# Output:
8.667668692073754
9.49495532726019
9.49495532726019
```

As you can see above, we received the same result from the code when we pass ‘ddof=1’ to the numpy **std()** function.

Hopefully this article has been helpful for you to understand how to find the standard deviation of a variable within a column or Series using pandas.

## Leave a Reply