# Data Visualization in Python: Deciding the right graph to represent the data

## Introduction

Although choosing the best visualization to represent the data is a skill that is acquired over-time, there are definitely certain guidelines that must be kept in mind to ensure that the data is represented correctly.

We will be discussing when to use and how to create each of the following listed graphs in detail.

The dataset that has been used to illustrate the examples in this article is publicly available and can be downloaded from this link

## Ground Work

1. Before we jump into the visualizations let us first view the data
1. Now check how many records does our data have using the “shape” command
1. Lets now get some more generic information about the data-frame using the “info” command and see how many null values are there in the dataset ## Single Variable

### Single Numeric Variable

For the first set of visualizations let us consider the variable age. Age is a single numeric feature, for single numeric features we can create density plots to better understand how is the data distributed over that feature’s range.

We can also create box-plots and violin-plots to flag out outliers and give us a notion about mean, and percentiles of the variable.

By looking at the above plots we can conclude that the age data is right skewed and most of the players are aged between 23 to 30 years.

### Single Categorical Variable

When we need to understand the distribution of a categorical features in a dataset, we can simply create a pie-chart for that categorical variable. Following is an example of how a pie-chart was created across Country.

We can make out from the above chart that for the top 5 countries almost 95.4% of the players belong to USA.

## Multiple Variables

### Numeric-Numeric

When we want to figure out relationship between two numeric variables the first thing we can do is to plot a scatter plot between these numerical variables.

Below is an example for creating a scatter-plot for the columns height and weight.

### Numeric-Categorical

#### Simple Bar Chart

When we want to analyze one numeric and one categorical variable then the chart would depend on the information we want to convey.

If we want to show comparison across various categories then we can use a simple bar chart as created below.

#### Stacked Bar Chart

If we want to show comparison as well as composition of certain categories then we can represent this using a stacked bar chart as illustrated below:

#### Grouped Bar Chart

Finally if we want to show comparison across various categorical segments then the best chart to portray this information would be a grouped bar chart as illustrated below:

### Numeric-Date

Last but not the least while visualizing a time-series data, the best way to represent it is using a line chart.

Now we have a clear understanding about which chart to choose to represent the data in the best possible manner.

Github Link