When I first learn to make the pairplots, I used them in every project for some time. I still use them a lot. Pair plots are several bivariate distributions in one plot and can be made using just one simple function. Even The most basic one is very useful in a data analytics project where there are several continuous variables. We know that a scatter plot is widely used to present the relationship between two continuous variables. Pair plot puts several scatter plots in one plot and also provides the distribution diagonally. This article is a tutorial on how to make Pairplots of different styles.

This article will cover:

- Pair plot using Pandas and Matplotlib
- More Stylish and informative Pair plots using Seaborn library
- Use of the ‘PairGrid’ function to make more dynamic pair plots

## Dataset

I used this famous dataset called the ‘nhanes’ dataset. I find it useful because it has a lot of continuous and categorical features. Please feel free to download the dataset from this link to follow along.

## Pair plot

Let’s import the dataset first

```
import pandas as pd
df = pd.read_csv("nhanes_2015_2016.csv")
```

The dataset is pretty big. So, I cannot show a screenshot here. These are the columns in the dataset:

`df.columns`

Output:

`Index(['SEQN', 'ALQ101', 'ALQ110', 'ALQ130', 'SMQ020', 'RIAGENDR', 'RIDAGEYR', 'RIDRETH1', 'DMDCITZN', 'DMDEDUC2', 'DMDMARTL', 'DMDHHSIZ', 'WTINT2YR', 'SDMVPSU', 'SDMVSTRA', 'INDFMPIR', 'BPXSY1', 'BPXDI1', 'BPXSY2', 'BPXDI2', 'BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML', 'BMXARMC', 'BMXWAIST', 'HIQ210'], dtype='object')`

We will start with the most basic ‘pairplot’ using the Pandas library. It is called ‘scatter_matrix’ in Pandas library.

Before that, I should mention I will use a part of the dataset. Because if I use too many features the scatter_matrix or pair plot whatever you call it, will not be very helpful. Each plot in it will be too small. Five continuous variables were chosen for this demonstration: ‘BMXWT’, ‘BMXHT’, ‘BMXBMI’, ‘BMXLEG’, ‘BMXARML’. They represent the weight, height, BMI, leg length, and arm length of the population.

We need to import scatter_matrix from the pandas’ library and then simply use the scatter_matrix function.

import matplotlib.pyplot as plt from pandas.plotting import scatter_matrixscatter_matrix(df[['BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML']], figsize = (10, 10)) plt.show()

Output:

You get the bivariate to scatter plots of all the combinations from the variable that is given. Each variable is plotted against the rest of the variables. Diagonal plots are the distributions of each variable given to the scatter_matrix function.

This same plot can be obtained using the seaborn library. As you know seaborn library comes with some default style. So, the most basic pairplot is also a bit more stylish than the scatter_matrix. Instead of the histogram, I chose density plot for diagonal plots.

I will make the density plots for the diagonals just for a little variation. So, I will specify ‘diag_kind = ‘kde’. If you want to keep the histogram just avoid specifying anything. Because the default type is the histogram.

```
import seaborn as sns
d = df[['BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML']]
sns.pairplot(d, diag_kind = 'kde')
```

d = df[['BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML', 'RIAGENDR']]sns.pairplot(d, diag_kind = 'kde', hue = 'RIAGENDR', plot_kws={'alpha':0.5, 'edgecolor': 'k'})

Output:

Paiplot and scatter_matrix both are based on scatter plots. PairGrid brings a bit more flexibility to it.

## PairGrid

Using PairGrid, an empty grid can be generated. And later you can fill this up as you like. Let’s see this in action:

`g = sns.PairGrid(df, vars = ['BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML'], hue = 'RIAGENDR')`

This line of code will provide an empty grid as follows:

Now, we will fill up those empty boxes. I will use the histograms for the diagonal plots and the rest will stay the scatter plot as before. It does not have to be a scatter plot. It can be any other bivariate plot. We will see an example in a bit:

```
g.map_diag(plt.hist, alpha = 0.6)
g.map_offdiag(plt.scatter, alpha = 0.5)
g.add_legend()
```

We segregated the plots using the gender parameter to see the distributions and scatter plots separately for males and females using the ‘hue’ parameter. The next plots use a continuous variable as a hue parameter. I chose ‘BPXSY1’ which means the systolic blood pressure for this. I am also going to add a condition to the dataset. I will use the data only for the ‘age’ over 60.

g = sns.PairGrid(df[df["RIDAGEYR"]>60], vars = ['BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML'], hue = "BPXSY1")g.map_diag(sns.histplot, hue = None) g.map_offdiag(sns.scatterplot) g.add_legend()

You can see the dots in the scatter plots are of different shades. Lighter the color lower the systolic blood pressure and darker the color higher the systolic blood pressure.

Look! The lower triangle and the upper triangle have almost the same plots. If you just switch the axis of the plots of the lower triangles, you get the plots of upper triangles. So, if you want to see different types of plots in the lower and the upper triangle of the Pairplot, PairGrid provides that flexibility. We just have to mention map_upper for the upper triangle of the pairplot and map_lower for the lower triangle.

g = sns.PairGrid(df[df["RIDAGEYR"]>60], vars = ['BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML'], hue = "RIAGENDR")g.map_lower(plt.scatter, alpha = 0.6) g.map_diag(plt.hist, alpha = 0.7)

This will plot only the diagonals and the lower triangle.

Let’s fill up the upper triangle as well. I will put the density plots with shades.

`g.map_upper(sns.kdeplot, shade =True)`

I think it can be more useful to have two different types of plots instead of almost the same plots in both the triangles.

Again, if you do not want any other type, you can totally avoid either the upper or lower triangle. Here I am using diag_sharey = False to avoid the upper triangle.

g = sns.PairGrid(df[df["RIDAGEYR"]>60], diag_sharey = False, corner = True, vars = ['BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML'], hue = "RIAGENDR")g.map_lower(plt.scatter, alpha = 0.6) g.map_diag(plt.hist, alpha = 0.7)

Output:

Here it is! The triangle is totally gone!

## Conclusion

Here I tried to introduce a few different ways to make and use pairplots. I also introduced PairGrids for more flexibility to the pair plots. Hopefully, this was helpful and you will try the documentation for more styles and options.

Here is the video from of the same content:

Please feel free to follow me on *Twitter*.

#DataVisualization #DataScience #DataAnalytics #Python