• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

The Programming Expert

Solving All of Your Programming Headaches

  • HTML
  • JavaScript
  • jQuery
  • PHP
  • Python
  • SAS
  • Ruby
  • About
You are here: Home / Python / PROC REG Equivalent in Python

PROC REG Equivalent in Python

December 11, 2020 Leave a Comment

When working with data as a data science or data analyst, regression analysis is very common and something that many industries and companies utilize to understand how different series of data are related.

There are many major companies and industries which use SAS (banking, insurance, etc.), but with the rise of open source and the popularity of languages such as Python and R, these companies are exploring converting their code to Python.

A commonly used procedure for regression analysis in SAS is the PROC REG procedure. In this article, you’ll learn the Python equivalent of PROC REG.

PROC REG Equivalent in Python

In SAS, when we do simple regression analysis on continuous variables, we use PROC REG. PROC REG performs Ordinary Least Squares (OLS).

Let’s say we have data such as the following:

In SAS, to do OLS on this data, for example, to look at the linear relationship between height and weight, we could simply do the following:

The output for this code looks like the following image:

We see here that the linear relationship between height and weight is significant (p_value of 0.0007).

To do this in Python, we can use the statsmodels package. Creating the model and fitting the model is very easy to do. After fitting the model, we print the results to verify we got the same coefficients and p_value as SAS.

import pandas as pd
import numpy as np
from statsmodels.formula.api import ols

model = 'height ~ weight'
results = ols(model,data=data).fit()
results.summary()

#output:
#                            OLS Regression Results
#==============================================================================
#Dep. Variable:                 height   R-squared:                       0.600
#Model:                            OLS   Adj. R-squared:                  0.569
#Method:                 Least Squares   F-statistic:                     19.46
#Date:                Sat, 09 Jan 2021   Prob (F-statistic):           0.000703
#Time:                        09:39:28   Log-Likelihood:                -45.073
#No. Observations:                  15   AIC:                             94.15
#Df Residuals:                      13   BIC:                             95.56
#Df Model:                           1
#Covariance Type:            nonrobust
#==============================================================================
#                 coef    std err          t      P>|t|      [0.025      0.975]
#------------------------------------------------------------------------------
#Intercept     10.2043      4.112      2.481      0.028       1.320      19.088
#weight         0.3520      0.080      4.412      0.001       0.180       0.524
#==============================================================================
#Omnibus:                        1.249   Durbin-Watson:                   2.506
#Prob(Omnibus):                  0.535   Jarque-Bera (JB):                0.334
#Skew:                           0.357   Prob(JB):                        0.846
#Kurtosis:                       3.150   Cond. No.                         157.
#==============================================================================
#
#Notes:
#[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Above we see that we obtained the same coefficient and p_value as SAS.

PROC REG Testing Residuals for Normality Equivalent in Python

When doing OLS and regression analysis, one of the main assumptions we need to test for is normality of the residuals.

To do this in SAS, we would do the following with proc univariate:

After running this code,we receive these results:

To do this in Python, we can use the scipy package to get the probability plot, and matplotlib to plot it. In SAS, we specified we wanted studentized residuals. To get these in Python, we need to get to a few more steps.

from scipy import stats
import matplotlib.pyplot as plt

influence = results.get_influence()
studentized_residuals = influence.resid_studentized_external
res = stats.probplot(studentized_residuals, plot=plt)
plt.show()

You can see that the chart is identical to the one produced in SAS.

To get the p_values for the different normality tests, we can use the Anderson and Shapiro functions from the stats package.

result = stats.anderson(studentized_residuals)
print(result)

#output:
#AndersonResult(statistic=0.5182987927026232, critical_values=array([0.498, 0.568, 0.681, 0.794, 0.945]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))

stat, p = stats.shapiro(studentized_residuals)
print(stat)
print(p)

#output:
#0.9361917972564697
#0.336889386177063

We see we receive the same statistics from these tests as we received from SAS.

The full code for this example in Python is below:

import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
from scipy import stats
import matplotlib.pyplot as plt

model = 'height ~ weight'
results = ols(model,data=data).fit()
results.summary()

influence = results.get_influence()
studentized_residuals = influence.resid_studentized_external
res = stats.probplot(studentized_residuals, plot=plt)
plt.show()

result = stats.anderson(studentized_residuals)
stat, p = stats.shapiro(studentized_residuals)

I hope that this example has helped you with translating your SAS PROC REG code into Python

Other Articles You'll Also Like:

  • 1.  Using pandas resample() to Resample Time Series Data
  • 2.  Format Numbers as Dollars in Python with format()
  • 3.  Get Tomorrow’s Date as Datetime in Python
  • 4.  Using Python to Read File Character by Character
  • 5.  Count Primes Python – How to Count Number of Primes in a List
  • 6.  Check if Line is Empty in Python
  • 7.  Python Round to Nearest 10 With round() Function
  • 8.  Touch File Python – How to Touch a File Using Python
  • 9.  How to Rotate a List in Python
  • 10.  Remove Every Nth Element from List in Python

About The Programming Expert

The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.

Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.

At the end of the day, we want to be able to just push a button and let the code do it’s magic.

You can read more about us on our about page.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

About The Programming Expert

the programming expert main image

Welcome to The Programming Expert. We are a group of US-based programming professionals who have helped companies build, maintain, and improve everything from simple websites to large-scale projects.

We built The Programming Expert to help you solve your programming problems with useful coding methods and functions in various programming languages.

Search

Learn Coding from Experts on Udemy

Looking to boost your skills and learn how to become a programming expert?

Check out the links below to view Udemy courses for learning to program in the following languages:

Copyright © 2023 · The Programming Expert · About · Privacy Policy