Visually Analyzing Multivariate Relationships using Python

Introduction

What are multivariate relationships?

The relationships that exist between two or more variables are known as multivariate relationships.

Various visual representations of multivariate relationships that we will be discussing in this article have been briefly highlighted below:

Article Flow

Article Flow

We will now go through each of the above defined visualizations in a detailed manner.

The dataset that has been used for the examples illustrated in this article is publicly available and can be downloaded from this link

Ground Work

  1. Before we jump into outlier detection and treatment let us first view the data

  1. Now check how many records does our data have using the “shape” command
Shape Output

Shape Output

We can conclude from the above output that the data consists of 550,068 rows and 12 columns

Numerical Features

  1. Let’s plot salary and age to identify whether any relationship lies between the two
Scatter Age Salary

Scatter Age Salary

  1. The above scatter plot is slightly tough to interpret so let’s divide the age feature into bins and find the mean of salary in each age bracket

 

Age Salary Mean

Age Salary Mean

We can conclude from the above plot that people that the mean salary of older customers is slightly higher as compared to the younger customers

Numerical and Categorical Features

There are numerous ways to plot numeric-categorical relationships, let’s see them one by one

  1. Let us try to understand how people are distributed across various countries using a multi-line graph
Age geography distribution

Age geography distribution

Findings from the above plot:

    • Most people in the dataset belong to France
    • Germany and Spain contain almost equal amount of people
    • Most people are aged between 30 to 50 years
    • France has a huge concentration of people that are aged between 30 to 50 years
  1. First let’s plot the average balance across gender and geography to understand which segment has the most balance on an average
Mean Balance across gender and geography

Mean Balance across gender and geography

Findings from the above plot:

    • An average German has more balance in their account as compared to a French and Spanish
    • Males have higher mean balance in their accounts as compared to females
    • German males have the highest mean balance as compared to all other segments
  1. Let’s now see what the male-female distribution looks like across Tenure by using a stacked bar chart
Male Female Frequency

Male Female Frequency

Findings from the above plot:

    • Most of the people fall within 1 to 9 years of tenure
    • Males and females are almost equally distributed across all tenures
  1. Let’s now find out which geography, gender segment has the highest credit card users using a multi-grid grouped bar chart

 

Credit Card users across gender-geography

Credit Card users across gender-geography

  1. Finally let’s try to create a combo-chart that reflects the count of people and estimated salary across tenure in the same plot
Combo Chart

Combo Chart

Findings from the above plot:

    • Mean estimated salary is the lowest for the people with Tenure of 3 years
    • Mean estimated salary is the highest for the people with Tenure of 10 years
    • Lowest number of people currently have Tenure of 0 years

Conclusion

We have now successfully understood and plotted basic multivariate relationships and we can now use this knowledge to derive inter-relationships between various variables to understand the data better.

Complete Code with Github Link

Github Link

You may also like...