top of page

EDA on Sales Analysis

In a recent data exploration adventure, I took on a project that involved digging into sales data to find important information. This data contained records of sales over a specific time period. My main goal was to deeply explore this data using two powerful tools: Pandas and Seaborn. With these tools, I aimed to discover useful insights about sales performance, figure out which products were selling the most, improve advertising strategies, and learn about customer behavior.

Bringing Data Together: Making Sense of It All To start off, I had to gather sales data from different files and combine them into one big picture. Each file represented sales for a different month. The challenge was to merge these different pieces of data into a single dataset. This combined dataset gave me a complete view of all the sales over time. I saved this merged data as a single file called 'all_data.csv', setting the stage for the rest of my analysis.

Getting Data Ready: Cleaning and Getting It Set Once I had all the data together, I focused on getting it ready for analysis. This meant dealing with missing data, which I handled by removing rows that were not useful. I also made the 'Order Date' column more helpful by converting it into a format that's easy to work with. I added a new column called 'Order Month' to see trends in sales over different months. To make sure the numbers were accurate, I carefully adjusted data types, turning some into whole numbers and others into decimals. This clean and organized dataset was the foundation for calculating total sales by multiplying the quantities sold with their prices.

Discovering Valuable Insights from Sales Data: The heart of my analysis was finding out which month had the most sales. Once I had a clean dataset to work with, it became clear that December was the top-performing month, with total sales reaching an impressive $4,613,443.34.

Another interesting part was figuring out which city had the highest sales. By looking at the 'Purchase Address' column, I could extract city-specific data. Adding up the sales for each city showed that San Francisco had the highest sales, followed closely by Los Angeles and New York City.

Finding the Right Time to Advertise: A crucial goal was to find the best times for advertising campaigns. To do this, I dove into the data to find the periods when customers made the most purchases. The analysis revealed that around 11 AM and 7 PM were the peak times for buying, suggesting that these hours were the best for running ads.

Spotlight on Top-Selling Products: Identifying the best-selling product was another key part of my analysis. By grouping data by product and calculating how many were sold, I found that the 'AAA Batteries (4-pack)' was the top seller. It was closely followed by the 'AA Batteries (4-pack)' and the 'USB-C Charging Cable'.

Concluding the Exploration:

In summary, my adventure into sales data using Pandas and Seaborn led me to discover a treasure trove of valuable insights. These insights ranged from sales trends to the best times for advertising and the most popular products. Armed with these findings, the company is now better equipped to make smart decisions, allocate resources wisely, and chart a course for growth and success. This project vividly shows the power of data analysis in shaping smart strategies, and I'm excited about the impact it will have.


bottom of page