Comprehensive Retail Data Analysis using SQL
Introduction
In this article we will be learning various SQL basics through examples by doing a comprehensive EDA on a Retail Sales Dataset
The dataset that has been used for the illustrations in this article can be downloaded from this link.
Let us first have a look at the dataset that we will be analyzing.
1 2 |
#Snap shot of the dataset select * from Retail_Sales; |
Based on the above output the features in the dataset can be broadly classified into the following categories:
Before diving deeper into the demographic and transactional analysis, let us have a look at how many distinct customers, shop categories and states are in the dataset.
1 2 |
select count(distinct Customer_ID),Count(distinct State), Count(distinct Shop_Category) from Retail_Sales; |
Demographic Analysis
We have three demographic features that are associated with every customer which are Age, Gender and State.
Let’s deep-dive into each of these features to understand the dataset better.
First let us see what is the minimum, maximum and average age of the customers in the dataset.
1 2 3 |
#Calculating the average, minimum and maximum age of the customers select min(Age), avg(Age), max(Age) from Retail_Sales; |
Let us find out how many males and females are in our customer base
1 2 3 4 |
#Gender distribution select count(distinct Customer_ID) as Customers, Gender from Retail_Sales group by Gender; |
Let us now have a look at which gender-state segment contributes the most to our dataset
1 2 3 4 |
select count(distinct Customer_ID) as Customers, State, Gender from Retail_Sales group by State,Gender Order by Customers desc; |
As can be seen from the above output the females from State DP contribute the most to our dataset
Transactional Analysis
Average Customer Spend
One key finding that would help us understand our dataset better is that customers from which state spend more.
A way to measure this is to calculate the Average customer spend metric that can be done in the following way:
1 2 3 4 5 6 |
# What is the average customer Spend across State select State ,count(distinct Customer_ID) as Customers, sum(Sales) as Total_Sales, sum(Sales)/count(distinct Customer_ID) as Average_Customer_Spend from Retail_Sales group by State Order by Average_Customer_Spend desc; |
As observed from the above output, the customers that belong to the state DP spend the most.
Sales Across Categories
1 2 3 4 5 |
# Which Category contributes to maximum amount of sales select Shop_category, sum(Sales) as Total_Sales from Retail_Sales group by Shop_category Order by Total_Sales desc; |
Purchase patterns across Gender
Let us have a look at which category do females spend the most on.
1 2 3 4 5 6 |
# Which Category do females mostly spend on select Shop_category, sum(Sales) as Total_Sales from Retail_Sales where Gender="F" group by Shop_category Order by Total_Sales desc; |
Let us calculate the same for male customers and analyze whether there is any difference between the purchase pattern of males and females
1 2 3 4 5 6 |
# Which Category do Males mostly spend on select Shop_category, sum(Sales) as Total_Sales from Retail_Sales where Gender="M" group by Shop_category Order by Total_Sales desc; |
Going by the above results we can conclude that males and females have similar purchase patterns.
Sales Across States
Let us find out which state contributes to the maximum amount of sales.
1 2 3 4 5 |
#Percentage contribution of Sales across States select sum(Sales) as Total_Sales, State from Retail_Sales group by State Order by Total_Sales desc; |
Average Basket Size across Gender
Let us find out which gender purchases more number of items per transaction.
1 2 3 4 5 |
#Basket size across gender select count( distinct Transaction_ID) as Transactions, sum(Items_in_basket) as Items, sum(Items_in_basket)/count( distinct Transaction_ID) as Items_Per_Transaction, Gender from Retail_Sales group by Gender; |
Therefore from the above output we can conclude that males purchase more number of items per transaction.
Results
Based on the above analysis, we can summaries the results as follows:
- The average age of the customers in the dataset is 48 years
- The state DP is the largest in terms of customer base as well as Sales
- Customers from the state AP have the lowest average customer spending and customers from DP have the highest average customer spending
- Both males and females spend the most on clothes which is followed by electronics
- Our customer base consists of 101 females and 74 males
- Males purchase more number of items per transaction