top of page

EDA on Income Patterns


In this project, we ventured into the "Adult Income" dataset using Python's Pandas and Seaborn libraries. The goal was to analyze the data and uncover insights about income trends based on various factors.


We started by loading the dataset and observing its initial rows to get a sense of its content. This helped us prepare for the journey ahead.


The exploration included:


1. Data Inspection: Took a peek at the first and last 10 rows of the dataset to understand its composition and format.


2. Dataset Size: Determined the number of rows and columns in the dataset, providing an overall perspective of its scale.


3. Data Overview: Utilizing the info() function, I gained insights into the column data types, the presence of missing values, and the memory usage.


4. Data Cleaning: Tackled missing values by visualizing them with a heatmap and replacing "?" entries with NaN. This was followed by removing rows containing any missing values, resulting in a cleaner dataset.


5. Data Exploration: The exploration included analyzing the age distribution and identifying individuals aged between 17 and 48. Additionally, we investigated workclass distributions and examined individuals with "Bachelors" and "Masters" degrees.


6. Bivariate Analysis: Through Seaborn's boxplot, I examined the relationship between income and age, uncovering how income varies across age groups.


7. Income Representation: Converted income values ("<=50K" and ">50K") into binary values (0 and 1) to facilitate further analysis.


8. High Income Workclasses: The analysis delved into workclass associations with high incomes, revealing patterns between work categories and income levels.


9. Gender and Income: Exploring gender influence on income, I discovered significant disparities in income levels between males and females.


10. Data Type Conversion: To optimize memory usage, I converted the "workclass" column's datatype to the category type.





In conclusion, the exploration of the "Adult Income" dataset using Pandas and Seaborn offered valuable insights into income trends. By using these tools, we transformed raw data into actionable knowledge, highlighting socioeconomic patterns for further understanding and decision-making.

Comments


bottom of page